Loading [MathJax]/extensions/MathMenu.js
A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization | CIE Journals & Magazine | IEEE Xplore
Scheduled Maintenance: On Tuesday, 8 April, IEEE Xplore will undergo scheduled maintenance from 1:00-5:00 PM ET (1800-2200 UTC). During this time, there may be intermittent impact on performance. We apologize for any inconvenience.

A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization

; ;
Open Access

Abstract:

In this paper, we try to accelerate sparse LU factorization on GPU. We present a tiled storage format and a parallel algorithm to improve the memory access pattern, and a...Show More

Abstract:

In this paper, we try to accelerate sparse LU factorization on GPU. We present a tiled storage format and a parallel algorithm to improve the memory access pattern, and a register blocking method to compress the on-chip working set. The OPENMP implementation of our algorithm gives more stable performance over different matrices, and outperforms SuperLU and KLU by 1.88∼6 times on an Intel 8-core CPU (Central processing unit) for matrices from the Florida matrix collection. Based on this algorithm, we further propose a GPU-CPU hybrid pipelined scheme to overlap computations on CPU with computations on GPU. Compared to the better of SuperLU and KLU on an Intel 8-core CPU, our algorithm achieves 1.1∼19.7-fold speedup on GPU for double precision. Compared to the OPENMP implementation of our algorithm on an Intel 8-core CPU, our GPU implementation gets a 2-fold speedup for the best cases.
Published in: Chinese Journal of Electronics ( Volume: 21, Issue: 1, January 2012)
Page(s): 7 - 12
Date of Publication: January 2012

ISSN Information:

Funding Agency:


References

References is not available for this document.