Skip to Main Content
Divide and conquer programs can achieve good performance on parallel computers and computers with deep memory hierarchies. We introduce architecture-cognizant divide and conquer algorithms, and explore how they can achieve even better performance. An architecture-cognizant algorithm has functionally-equivalent variants of the divide and/or combine functions, and a variant policy that specifies which variant to use at each level of recursion. An optimal variant policy is chosen for each target computer via experimentation. With h levels of recursion, an exhaustive search requires theta(vh) experiments (where v is the number of variants). We present a method based on dynamic programming that reduces this to theta(vc) (where c is typically a small constant) experiments for a class of architecture-cognizant programs. We verify our technique on two kernels (matrix multiply and 2-D Point Jacobi) using three architectures. Our technique improves performance by up to a factor of two, compared to architecture-oblivious divide and conquer implementations. Further our dynamic programming approach succeeds in selecting the optimal variant policy.