Skip to Main Content
Distributed stream query services must simultaneously process a large number of complex, continuous queries with stringent performance requirements while utilizing distributed processing resources. In this paper we present the design and evaluation of a distributed stream query service that achieves massive scalability, a key design principle for such systems, by taking advantage of the opportunity to reuse the same distributed operator for multiple and different concurrent queries. We present concrete techniques that utilize the well-defined semantics of CQL-style queries to reduce the cost of query deployment and duplicate processing thereby increasing system throughput and scalability. Our system exhibits several unique features, including : (1) a 'reuse lattice' to encode both operator similarity and network locality using a uniform data structure; (2) techniques to generate an optimized query grouping plan in the form of 'relaxed operators' to capitalize on reuse opportunities while taking into account multiple run-time variations, such as network locality, data rates, and operator lifetime; and (3) techniques to modify operator semantics at runtime to facilitate reuse. Evaluation of our service-oriented design and techniques under realistic workloads shows that stream queries relaxed and grouped using our approach operate efficiently without a priori knowledge of workload, and offer an order of magnitude improvement in performance over existing approaches.