Skip to Main Content
Modular clusters are now composed of nonuniform nodes with different CPUs, disks or network cards so that customers can adapt the cluster configuration to the changing technologies and to their changing needs. This challenges dataflow parallelism as the primary load balancing technique of existing parallel database systems. We show in this paper that dataflow parallelism alone is ill suited for modular clusters because running the same operation on different subsets of the data can not fully utilize non-uniform hardware resources. We propose and evaluate new load balancing techniques that blend pipeline parallelism with data parallelism. We consider relational operators as pipelines of fine-grained operations that can be located on different cluster nodes and executed in parallel on different data subsets to best exploit non-uniform resources. We present an experimental study that confirms the feasibility and effectiveness of the new techniques in a parallel execution engine prototype based on the open-source DBMS Predator.