By Topic

Time-stratified sampling for approximate answers to aggregate queries

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Costa, J.P. ; Dept. Informatica e de Sistemas, Instituto Superior de Engenharia, Coimbra, Portugal ; Furtado, P.

In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex aggregate queries based on samples. However, uniformly extracted samples often do not guarantee acceptable accuracy in grouping interval estimations. This is crucial in most less-aggregated analyses, which are mostly based on recent data (e.g. forecasting, performance analysis). We propose the use of time-interval stratified samples (TISS), a simple sampling strategy that biases towards recency. This improves the accuracy in important less-aggregated analysis without significantly deteriorating aggregated analysis on older data. TISS obtains a much better accuracy than either uniform or the recently proposed congressional samples (CS) for queries analyzing recent data and can be coupled with CS to provide minimal representation guarantees (TISS-CS). We discuss TISS design, the loading process and the query processing middle-layer. We show that TISS is very easily integrated in a data warehouse and works transparently. TISS is evaluated experimentally in a TPC-H setup.

Published in:

Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International Conference on

Date of Conference:

26-28 March 2003