By Topic

Performance of panel and block approaches to sparse Cholesky factorization on the iPSC/860 and Paragon multicomputers

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Rothberg, E. ; Intel Sci. Comput., Beaverton, OR, USA

Sparse Cholesky factorization has historically achieved extremely low performance on distributed memory multiprocessors. Three issues must be addressed to improve this situation: (1) parallel factorization methods must be based on more efficient sequential methods; (2) parallel machines must provide higher interprocessor communication bandwidth; and (3) the sparse matrices used to evaluate parallel sparse factorization performance should be more representative of the sizes of matrices people would factor on large parallel machines. All of these issues have in fact already been addressed. Specifically: (1) single-node performance can be improved by moving from a column-oriented approach, where the computational kernel is Level 1 BLAS, to either a panel- or block-oriented approach, where the kernel is Level 3 BLAS; (2) communication hardware has improved dramatically, with new parallel computers providing higher communication bandwidth than previous parallel computers; and (3) several larger benchmark matrices are now available, and newer parallel machines offer sufficient memory per node to factor these larger matrices. The result of addressing these three issues is extremely high performance on moderately parallel machines. This paper demonstrates performance levels of 650 double-precision MFLOPS on 32 processors of the Intel Paragon system, 1 GFLOPS on 64 processors, and 1.7 GFLOPS on 128 processors. This paper also does a direct performance comparison between the iPSC/860 and Paragon systems, as well as a comparison between panel- and block-oriented approaches to parallel factorization

Published in:

Scalable High-Performance Computing Conference, 1994., Proceedings of the

Date of Conference:

23-25 May 1994