Skip to Main Content
Dynamic programming (DP) is a popular technique which is used to solve combinatorial search and optimization problems. This paper focuses on one type of DP, which is called nonserial polyadic dynamic programming (NPDP). Owing to the nonuniform data dependencies of NPDP, it is difficult to exploit either parallelism or locality. Worse still, the emerging multi/many-core architectures with small on-chip memory make these issues more challenging. In this paper, we address the challenges of exploiting the fine grain parallelism and locality of NPDP on multicore architectures. We describe a latency-tolerant model and a percolation technique for programming on multicore architectures. On an algorithmic level, both parallelism and locality do benefit from a specific data dependence transformation of NPDP. Next, we propose a parallel pipelining algorithm by decomposing computation operators and percolating data through a memory hierarchy to create just-in-time locality. In order to predict the execution time, we formulate an analytical performance model of the parallel algorithm. The parallel pipelining algorithm achieves not only high scalability on the 160-core IBM Cyclops64, but portable performance as well, across the 8-core Sun Niagara and quad-cores Intel Clovertown.