![Concise and Accurate Data Summaries for Fast Approximate Query Answering [microform]](/_next/image?url=https%3A%2F%2Fstorage.googleapis.com%2Fmenrva_img_storage%2Fcovers%2Fmenrva-default-cover.jpg&w=750&q=85)
Many techniques have been proposed to support fast approximate query answering using summarized information of the data. Among them, histogram techniques and wavelet techniques are two popular types that have been extensively studied. Second, we present a thorough experimental evaluation of previously proposed histogram and wavelet techniques. In this thesis, we investigate the trade-off between the space used and the accuracy of various histogram and wavelet techniques. We also examine their construction costs and query answering time. The major contributions of this thesis are as follows. Finally, we identify the characteristics of the data for which wavelet techniques perform poorly or excellently. We present an algorithm, called the Majorization Ranking Test (MRT) algorithm, to quickly determine which wavelet technique to use for fast approximate query answering (if any). The MRT algorithm also allows us to decide whether to use wavelet techniques or histogram techniques. We also present a new family of wavelet techniques, the Space Efficient Wavelet (SEW) techniques, which improve on previously proposed wavelet techniques by utilizing space in a more efficient way. We show that the SEW techniques dominate previously proposed wavelet techniques in both one-dimensional and multi-dimensional cases. Third, we present a new family of histograms, the Hierarchical Model Fitting (HMF) histograms, based on the Minimum Description Length (MDL) principle, which has been widely used for model selection in statistics and machine learning. The one-dimensional HMF histogram is applicable to one-dimensional data, and the multi-dimensional HMF histogram is applicable to multi-dimensional data. The HMF histograms can be constructed to either seek the highest possible accuracy within a given space budget, or seek the most concise representation that leads to accuracy within a specified tolerance. We show that the HMF histograms are capable of providing more accurate approximation
Page Count:
576
Publication Date:
2004-01-01
ISBN-10:
0612916731
ISBN-13:
9780612916739
No comments yet. Be the first to share your thoughts!