By Topic

Sampling-based selectivity estimation for joins using augmented frequent value statistics

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Haas, P.J. ; IBM Almaden Res. Center, San Jose, CA, USA ; Swami, A.N.

We compare empirically the cost of estimating the selectivity of a star join using the sampling-based t-cross procedure to the cost of computing the join and obtaining the exact answer. The relative cost of sampling can be excessive when a join attribute value exhibits “heterogeneous skew.” To alleviate this problem, we propose Algorithm TCM, a modified version of t-cross that incorporates “augmented frequent value” (AFV) statistics. We provide a sampling-based method for estimating AFV statistics that does not require indexes on attribute values, requires only one pass though each relation, and uses an amount of memory much smaller than the size of a relation. Our experiments show that the use of estimated AFV statistics can reduce the relative cost of sampling by orders of magnitude. We also show that use of estimated AFV statistics can reduce the relative error of the classical System R selectivity formula

Published in:

Data Engineering, 1995. Proceedings of the Eleventh International Conference on

Date of Conference:

6-10 Mar 1995