Journals & Magazines >IEEE Transactions on Parallel... >Volume: 36 Issue: 3

High Performance Householder QR Factorization on Emerging GPU Architectures Using Tensor Cores

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Since 2017, NVIDIA GPUs have been equipped with specialized units known as Tensor Cores, which demonstrate remarkable efficiency in processing matrix multiplications (GEM...Show More

Metadata

Abstract:

Since 2017, NVIDIA GPUs have been equipped with specialized units known as Tensor Cores, which demonstrate remarkable efficiency in processing matrix multiplications (GEMMs). Beyond GEMMs, researchers have explored the potential applications of Tensor Cores in matrix factorization, such as QR factorization. However, the inside GEMMs in QR factorization are typically tall and skinny. Compared to compute-bound square GEMMs, these tall and skinny GEMMs are memory bound, leading to suboptimal performance on Tensor Cores. To solve this problem, we indicate the recursive QR factorization can convert the tall and skinny GEMMs to relatively square and large GEMMs, resulting in better performance on Tensor Cores. Besides, we extend the FP16 Tensor-Cores-based QR factorization to accommodate FP32 and FP64 on FP16 and INT8 Tensor Cores, respectively. Additionally, to address the issue of orthogonality loss in the preceding Tensor Cores-based QR factorization, we transition from the Gram-Schmidt to the Householder algorithm while preserving high performance. According to our experimental evaluation conducted on NVIDIA's A100 and GeForce RTX 3090 GPU, the precision levels of FP64, FP32, and FP16 are up to 6.22x, 8.67x, and 4.03x faster, respectively, than the current state-of-the-art implementations.

Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 36, Issue: 3, March 2025)

Page(s): 422 - 436

Date of Publication: 25 December 2024

ISSN Information:

DOI: 10.1109/TPDS.2024.3522776

Funding Agency:

Contents

References is not available for this document.

High Performance Householder QR Factorization on Emerging GPU Architectures Using Tensor Cores

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

High Performance Householder QR Factorization on Emerging GPU Architectures Using Tensor Cores

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?