Skip to Main Content
The emergence of grid and a new class of data-driven applications is making a new form of parallelism desirable, which we refer to as coarse-grained pipelined parallelism. Here, the computations associated with an application are carried out in several stages, which are executed on a pipeline of computing units. This paper reports on a compilation system developed to exploit this form of parallelism. We use a dialect of Java that exposes both pipelined and data parallelism to the compiler. Our compiler is responsible for selecting a set of candidate filter boundaries, determining the volume of communication required if a particular boundary is chosen, performing the decomposition, and generating code in which each filter unpacks data from a received buffer, iterates over its elements, and packs and forwards a buffer to the next stage. We have developed a one-pass algorithm for determining the required communication between consecutive filters. We have developed a cost model for estimating the execution time for a given decomposition, and a greedy algorithm for performing the decomposition. We have carried out a detailed evaluation of our current compiler using four data-driven applications. Our experimental results show the following: (1) The compiler decomposed versions achieve an improvement between 10% and 150% over versions that use pipelined parallelism in a default fashion, (2) In most cases, increasing the width of the pipeline results in near-linear speedups, and (3) For the two applications where we could compare against a manual version, the compiler generated versions were generally quite close to the manual versions.