By Topic

Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Dziekonski, A. ; Dept. of Microwave & Antenna Eng., Gdansk Univ. of Technol., Gdansk, Poland ; Lamecki, A. ; Mrozowski, M.

This letter presents techniques for tuning an accelerated preconditioned conjugate gradient solver with a multilevel preconditioner. The solver is optimized for a fast solution of sparse systems of equations arising in computational electromagnetics in a finite element method using higher-order elements. The goal of the tuning is to increase the throughput while at the same time reducing the memory requirements in order to allow one to process very large complex or real systems in single and double precision using commodity graphic processing units (GPUs). A threefold memory footprint reduction is achieved by means of a new format of storing sparse matrices. The acceleration is achieved by optimizing a sparse matrix-vector product on a GPU by applying new features of the Fermi architecture. Further improvements are obtained by introducing more levels into the preconditioner and the application of a fast sparse direct solver for the operations executed on a CPU. Numerical results for a setup consisting of a Fermi GPU (GTX 480) and a Xeon six-core CPU showed that the proposed approach allows one to handle systems involving millions of unknowns and reach the speedup factor of almost 4 compared to the CPU-only implementation.

Published in:

Antennas and Wireless Propagation Letters, IEEE  (Volume:10 )