Abstract:
Online aggregation is a commonly used technology to return approximate query results over random samples, which provides a fast way for users to obtain a trade-off betwee...Show MoreMetadata
Abstract:
Online aggregation is a commonly used technology to return approximate query results over random samples, which provides a fast way for users to obtain a trade-off between time and accuracy. The key issue of online aggregation is how to guarantee the efficiency and effectiveness of random sample collection. However, the state-of-the-art approaches either adopt the random sampling method or adopt the sequential sampling with preprocessing to obtain the uniform samples. The former one suffers from the inefficient random access to the whole dataset especially for skewed data distribution, and the later one is limited by the time-consuming preprocessing. To make the sampling more efficient, we propose a scalable sampling algorithm called logic-partition based Gaussian sampling. The basic idea of our solution is convert the random sampling into a near-sequential sampling without any extra preprocessing, and achieve a balance between the sampling efficiency and sample quality. Extensive experiments using the TPC-H benchmark for skewed data distribution have demonstrated the superior performance of our solution.
Date of Conference: 13-16 August 2017
Date Added to IEEE Xplore: 07 September 2017
ISBN Information: