Abstract:
The proposed FSCHOL framework consists of an FPGA kernel implementing a throughput-optimized hardware architecture for accelerating the supernodal multifrontal algorithm ...Show MoreMetadata
Abstract:
The proposed FSCHOL framework consists of an FPGA kernel implementing a throughput-optimized hardware architecture for accelerating the supernodal multifrontal algorithm for sparse Cholesky factorization and a host program implementing a novel scheduling algorithm for finding the optimal execution order of supernodes computations for an elimination tree on the FPGA to eliminate the need for off-chip memory access for storing intermediate results. Moreover, the proposed scheduling algorithm minimizes on-chip memory requirements for buffering intermediate results by resolving the dependency of parent nodes in an elimination tree through temporal parallelism. Experiment results for factorizing a set of sparse matrices in various sizes from SuiteSparse Matrix Collection show that the proposed FSCHOL implemented on an Intel Stratix 10 GX FPGA development board achieves on average 5.5× and 9.7× higher performance and 10.3× and 24.7× lower energy consumption than implementations of CHOLMOD on an Intel Xeon E5-2637 CPU and an NVIDIA V100 GPU, respectively.
Published in: 2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
Date of Conference: 26-29 October 2021
Date Added to IEEE Xplore: 28 December 2021
ISBN Information:
ISSN Information:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- High-performance Computing ,
- Cholesky Decomposition ,
- Energy Consumption ,
- Sparse Matrix ,
- Tree Nodes ,
- Intermediate Results ,
- Low Energy Consumption ,
- Set Of Matrices ,
- Optimal Order ,
- Scheduling Algorithm ,
- Hardware Architecture ,
- Sparse Factorization ,
- Off-chip Memory ,
- Matrix Elements ,
- Power Consumption ,
- Column Vector ,
- Element Of Vector ,
- Forward Error Correction ,
- Clock Rate ,
- Critical Path ,
- Updated Matrix ,
- GPU Implementation ,
- Inverse Square Root ,
- Intermediate Data ,
- Sparse Algorithm ,
- Sparsity Pattern ,
- High-level Synthesis ,
- Load Modulation ,
- Amount Of Access ,
- Job Information
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- High-performance Computing ,
- Cholesky Decomposition ,
- Energy Consumption ,
- Sparse Matrix ,
- Tree Nodes ,
- Intermediate Results ,
- Low Energy Consumption ,
- Set Of Matrices ,
- Optimal Order ,
- Scheduling Algorithm ,
- Hardware Architecture ,
- Sparse Factorization ,
- Off-chip Memory ,
- Matrix Elements ,
- Power Consumption ,
- Column Vector ,
- Element Of Vector ,
- Forward Error Correction ,
- Clock Rate ,
- Critical Path ,
- Updated Matrix ,
- GPU Implementation ,
- Inverse Square Root ,
- Intermediate Data ,
- Sparse Algorithm ,
- Sparsity Pattern ,
- High-level Synthesis ,
- Load Modulation ,
- Amount Of Access ,
- Job Information
- Author Keywords