By Topic

Sliding HyperLogLog: Estimating Cardinality in a Data Stream over a Sliding Window

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Chabchoub, Y. ; BILab, Telecom ParisTech, Paris, France ; Hebrail, G.

In this paper, a new algorithm estimating the number of active flows in a data stream is proposed. This algorithm adapts the HyperLogLog algorithm of Flajolet et al. to data stream processing by adding a sliding window mechanism. It has the advantage to estimate at any time the number of flows seen over any duration bounded by the length of the sliding window. The estimate is very accurate with a standard error of about 1.04/√m (the same as in HyperLogLog algorithm), where m is the number of registers in the required memory. As the new algorithm answers more flexible queries, it needs additional memory storage compared to HyperLogLog algorithm. It is proved that the total required memory is at most equal to 5mln(n/m) bytes, where n is the real number of flows in the sliding window. For instance, with a memory of only 35kB, a standard error of about 3% can be achieved for a data stream of several million flows. Theoretical results are validated on both real and synthetic traffic.

Published in:

Data Mining Workshops (ICDMW), 2010 IEEE International Conference on

Date of Conference:

13-13 Dec. 2010