Skip to Main Content
General-purpose computing on graphics processing units (GPGPU), with programming models such as the Compute Unified Device Architecture (CUDA) by NVIDIA, offers the capability for accelerating the solution process of computational electromagnetics analysis. However, due to the communication-intensive nature of the finite-element algorithm, both the assembly and the solution phases cannot be implemented via fine-grained many-core GPU processors in a straightforward manner. In this paper, we identify the bottlenecks in the GPU parallelization of the Finite-Element Method for electromagnetic analysis, and propose potential solutions to alleviate the bottlenecks. We first discuss efficient parallelization strategies for the finite-element matrix assembly on a single GPU and on multiple GPUs. We then explore parallelization strategies for the finite-element matrix solution, in conjunction with parallelizable preconditioners to reduce the total solution time. We show that with a proper parallelization and implementation, GPUs are able to achieve significant speedups over OpenMP-enabled multi-core CPUs.