Skip to Main Content
The Fast Multipole Method (FMM) allows O(N) evaluation to any arbitrary precision of N-body interactions that arises in many scientific contexts. These methods have been parallelized, with a recent set of papers attempting to parallelize them on heterogeneous CPU/GPU architectures . While impressive performance was reported, the algorithms did not demonstrate complete weak or strong scalability. Further, the algorithms were not demonstrated on nonuniform distributions of particles that arise in practice. In this paper, we develop an efficient scalable version of the FMM that can be scaled well on many heterogeneous nodes for nonuniform data. Key contributions of our work are data structures that allow uniform work distribution over multiple computing nodes, and that minimize the communication cost. These new data structures are computed using a parallel algorithm, and only require a small additional computation overhead. Numerical simulations on a heterogeneous cluster empirically demonstrate the performance of our algorithm.