Skip to Main Content
In this paper, a new algorithm estimating the number of active flows in a data stream is proposed. This algorithm adapts the HyperLogLog algorithm of Flajolet et al. to data stream processing by adding a sliding window mechanism. It has the advantage to estimate at any time the number of flows seen over any duration bounded by the length of the sliding window. The estimate is very accurate with a standard error of about 1.04/√m (the same as in HyperLogLog algorithm), where m is the number of registers in the required memory. As the new algorithm answers more flexible queries, it needs additional memory storage compared to HyperLogLog algorithm. It is proved that the total required memory is at most equal to 5mln(n/m) bytes, where n is the real number of flows in the sliding window. For instance, with a memory of only 35kB, a standard error of about 3% can be achieved for a data stream of several million flows. Theoretical results are validated on both real and synthetic traffic.