Skip to Main Content
In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex aggregate queries based on samples. However, uniformly extracted samples often do not guarantee acceptable accuracy in grouping interval estimations. This is crucial in most less-aggregated analyses, which are mostly based on recent data (e.g. forecasting, performance analysis). We propose the use of time-interval stratified samples (TISS), a simple sampling strategy that biases towards recency. This improves the accuracy in important less-aggregated analysis without significantly deteriorating aggregated analysis on older data. TISS obtains a much better accuracy than either uniform or the recently proposed congressional samples (CS) for queries analyzing recent data and can be coupled with CS to provide minimal representation guarantees (TISS-CS). We discuss TISS design, the loading process and the query processing middle-layer. We show that TISS is very easily integrated in a data warehouse and works transparently. TISS is evaluated experimentally in a TPC-H setup.
Date of Conference: 26-28 March 2003