Skip to Main Content
Partitioned query processing is an effective method to process continuous queries with large stateful operators in a distributed systems. This method typically partitions input data into non-overlapping portions, with each query plan instance installed on a separate machine processing only one portion of the data. Dynamic redistribution of load among machines is then employed to handle fluctuating stream characteristics. However, existing load redistribution solutions have made the implicit assumption that no local query optimization is conducted at runtime on any of the participating machines, i.e., all local query plan instances are static and thus remain identical. This is restrictive for dynamic stream systems, where data partitions may experience significant fluctuations in selectivities or arrival rates over time - thus warranting local plan reoptimization. This raises the new problem that the heterogeneity of plan shapes among different machines must be tackled when doing load redistribution. To address this, we propose two new load balancing strategies along with corresponding protocols, that can balance the workload across a set of machines while seamlessly handling the complexity caused by local plan changes. The PTLB strategy is plan-agnostic, requiring no detailed knowledge of the underlying query plan. The MSLB strategy is plan-aware, that is. it rebalances the load by comparing the plan shape differences on the participating machines. All proposed techniques have been implemented in the DCAPE continuous query system. Our experiments demonstrate that the application of both query optimization and load balancing results in superior performance compared to applying either of the adaptation techniques alone - as has been the state-of-the-art in the current literature. Our evaluation compares the relative applicability and efficiency of the two proposed techniques PTLB and MSLB.