Skip to Main Content
This letter presents techniques for tuning an accelerated preconditioned conjugate gradient solver with a multilevel preconditioner. The solver is optimized for a fast solution of sparse systems of equations arising in computational electromagnetics in a finite element method using higher-order elements. The goal of the tuning is to increase the throughput while at the same time reducing the memory requirements in order to allow one to process very large complex or real systems in single and double precision using commodity graphic processing units (GPUs). A threefold memory footprint reduction is achieved by means of a new format of storing sparse matrices. The acceleration is achieved by optimizing a sparse matrix-vector product on a GPU by applying new features of the Fermi architecture. Further improvements are obtained by introducing more levels into the preconditioner and the application of a fast sparse direct solver for the operations executed on a CPU. Numerical results for a setup consisting of a Fermi GPU (GTX 480) and a Xeon six-core CPU showed that the proposed approach allows one to handle systems involving millions of unknowns and reach the speedup factor of almost 4 compared to the CPU-only implementation.