Skip to Main Content
Data streaming systems are becoming essential for monitoring applications such as financial analysis, network intrusion detection and sensor network. These systems often have to process multiple similar but different continuous aggregation queries simultaneously. Since executing each query separately can lead to significant scalability and performance problems, it is vital to share resources by exploiting similarities in the queries. The challenge is to identify overlapping computations that may not be obvious in the queries themselves. In this paper, we reveal new opportunities for sharing work in the context of distributed aggregation queries that vary in their group by predicates. We identify settings in which a large set of m such queries can be answered by executing n< m different queries. The n queries are revealed by analyzing the binary two-dimension array capturing the connection among the queries that they satisfy. We propose a novel algorithmic solution for problem of finding the minimum number of queries in such a distributed-streams setting, in order to optimize the communicate cost across the network. The experiment result show that our approach gives us as much as magnitude performance improvement over the no-share settings.