Skip to Main Content
Efficient one-pass computation of F0, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider the problem of efficiently estimating F0 of a data stream where each element of the stream is an interval of integers. We present a randomized algorithm which gives an (ε, δ) approximation of F0, with the following time complexity (n is the size of the universe of the items): (1) the amortized processing time per interval is O(log1/δ log n/ε). (2) The time to answer a query for F0 is O(log1/δ). The workspace used is O(1/ε2log1/δlogn) bits. Our algorithm improves upon a previous algorithm by Bar-Yossef Kumar and Sivakumar (2002), which requires O(1/ε5log1/δlog5n) processing time per item. Our algorithm can be used to compute the max-dominance norm of a stream of multiple signals, and significantly improves upon the current best bounds due to Cormode and Muthukrishnan (2003). This also provides efficient and novel solutions for data aggregation problems in sensor networks studied by Nath and Gibbons (2004) and Considine et. al. (2004).
Date of Conference: 5-8 April 2005