By Topic

Reservoir Sampling over Memory-Limited Stream Joins

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Al-Kateb, M. ; Univ. of Vermont, Burlington ; Byung Suk Lee ; Wang, X.S.

In stream join processing with limited memory, uniform random sampling is useful for approximate query evaluation. In this paper, we address the problem of reservoir sampling over memory-limited stream joins. We present two sampling algorithms, reservoir join-sampling (RJS) and progressive reservoir join-sampling (PRJS). RJS is designed straightforwardly by using a fixed-size reservoir sampling on a join-sample (i.e., random sample of a join output stream). Anytime the sample in the reservoir is used, RJS always gives a uniform random sample of the original join output stream. With limited memory, however, the available memory may not be large enough even for the join buffer, thereby severely limiting the reservoir size. PRJS alleviates this problem by increasing the reservoir size during the join-sampling. This increasing is possible since the memory requirement by the join-sampling algorithm decreases over time. A larger reservoir provides a closer representation of the original join output stream. However, it comes with a negative impact on the probability of the sample being uniform. Through experiments we examine the tradeoffs and compare the two algorithms in terms of the aggregation error on the reservoir sample.

Published in:

Scientific and Statistical Database Management, 2007. SSBDM '07. 19th International Conference on

Date of Conference:

9-11 July 2007