Skip to Main Content
The Fast Multipole Method (FMM) and Multi- Level Fast Multipole Algorithm (MLFMA) have been used to solve electromagnetic scattering problems for many years. Parallel implementations of MLFMA is currently a hot topic because it is capable of solving scattering problems with tens of millions of unknowns, with complexity O(NlogN), where N is the number of unknowns. In this paper, we discuss a new perfectly parallel implementation of MLFMA. With the increasing of unknowns and the complexity of computing objects, the program behaviors especially the communication behaviors become chaotic. Thus, it is necessary to discover the bottleneck and the inefficient regions using existing parallel implementation performance analysis toolsets. The main focus of the present paper is to discuss how we use Scalasca (an open source professional analysis toolset) and other analysis tools to analyze our parallel MLFMA implementation, find the bottlenecks and inefficient parts of the implementation and accordingly optimize and modify the code. The paper highlights some necessary tricks that we employed and without which the use of Scalasca to analyze the program would have been impossible.