Abstract:
Optimizing a particular subprogram out of the set of Basic (sparse) Linear Algebra Subprograms (BLAS) for a given architecture is a common topic of research. In applicati...Show MoreMetadata
Abstract:
Optimizing a particular subprogram out of the set of Basic (sparse) Linear Algebra Subprograms (BLAS) for a given architecture is a common topic of research. In applications, however, these BLAS functions rarely appear in isolation; usually, many of them are used together, in various combinations and with varying inputs. As the need to solve a large, sparse linear system is ubiquitous throughout HPC applications, linear solvers constitute a realistic, sufficiently complex and well-defined representative use case for composite BLAS routines. To this end, based on a representative set of matrices drawn from a diverse set of fields, we present a framework to study, from the performance and energy perspective, the efficacy of GPU-resident parallel Conjugate Gradient (CG) linear solver with different preconditioner options, including Gauss-Seidel, Jacobi, and incomplete Cholesky. We also propose a novel GPU-based preconditioner, in which the triangular solves are approximated by an iterative process. The development of this preconditioner was motivated by solving large graph Laplacian linear systems, for which the existing preconditioners either perform slow on GPU-based platforms or are not applicable. We compare the performance of these preconditioners on different hardware accelerator architectures, i.e., AMD MI250X, MI100, Nvidia A100, V100, and Jetson. Our experiments reveal performance trade-offs and provide information on how to select the best strategy for the given linear system, dictated by its properties, and the platform of interest. We demonstrate the application of our novel preconditioner for solving CG and graph Laplacian systems. Overall, the framework can be utilized as a benchmark to guide informed decisions in choosing a specific preconditioner, i.e., whether it is better to rely on the performance of a triangular solver or on the performance of sparse matrix-vector product. Finally, by considering power consumption to solve the linear systems, we rep...
Published in: 2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
Date of Conference: 13-15 November 2024
Date Added to IEEE Xplore: 27 November 2024
ISBN Information: