Scheduled System Maintenance:
On May 6th, single article purchases and IEEE account management will be unavailable from 8:00 AM - 5:00 PM ET (12:00 - 21:00 UTC). We apologize for the inconvenience.
By Topic

Load Shedding for Window Joins on Multiple Data Streams

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Yan-Nei Law ; Bioinf. Inst., Singapore ; Zaniolo, C.

We consider the problem of semantic load shedding for continuous queries containing window joins on multiple data streams and propose a robust approach that is effective with the different semantic accuracy criteria that are required in different applications. In fact, our approach can be used to (i) maximize the number of output tuples produced by joins, and (ii) optimize the accuracy of complex aggregates estimates under uniform random sampling. We first consider the problem of computing maximal subsets of approximate window joins over multiple data streams. Previously proposed approaches are based on multiple pair-wise joins and, in their load-shedding decisions, disregard the content of streams outside the joined pairs. To overcome these limitations, we optimize our load-shedding policy using various predictors of the productivity of each tuple in the window. To minimize processing costs, we use a fast-and-light sketching technique to estimate the productivity of the tuples. We then show that our method can be generalized to produce statistically accurate samples, as needed in, e.g.. the computation of averages, quantiles. and stream mining queries. Tests performed on both synthetic and real-life data demonstrate that our method outperforms previous approaches, while requiring comparable amounts of time and space.

Published in:

Data Engineering Workshop, 2007 IEEE 23rd International Conference on

Date of Conference:

17-20 April 2007