Conferences >2023 32nd International Confe...

INTERPRET: Inter-Warp Register Reuse for GPU Tensor Core

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Tensor cores in the recent NVIDIA GPUs are under the spotlight due to their superior computation throughput for general matrix-matrix multiplication (GEMM) that has been ...Show More

Metadata

Abstract:

Tensor cores in the recent NVIDIA GPUs are under the spotlight due to their superior computation throughput for general matrix-matrix multiplication (GEMM) that has been widely used for deep learning applications. For massive-scale GEMMs, the entire matrix is practically divided into sub-matrices and assigned to multiple thread blocks and warps, and then processed by the tensor cores. Meanwhile, the same sub-matrix is regularly reused as an input to different sub-GEMMs, which causes redundant load operations from different warps and waste of register file spaces. To tackle this issue, we propose INTERPRET, a novel tensor core microarchitecture designed to minimize unnecessary accesses to the cache/memory hierarchy by leveraging the inter-warp data reuse characteristics. INTERPRET adopts a register renaming scheme to reduce the redundant load requests as well as the waste of register files, resulting in the reduction of the effective data load latency. INTERPRET further improves performance via non-speculative tensor preloading by leveraging the register file space saved by the register renaming. As INTERPRET is implemented based on the data access patterns of tensor core operations exhibiting a high level of regularity, the synergistic integration of the register renaming and tensor preloading can significantly improve the processing efficiency. Our experiments show that the proposed design achieves an average speedup of 34.1% and reduces energy consumption by 27.9%.

Published in: 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Date of Conference: 21-25 October 2023

Date Added to IEEE Xplore: 27 December 2023

ISBN Information:

DOI: 10.1109/PACT58117.2023.00033

Conference Location: Vienna, Austria

Funding Agency:

Contents

References is not available for this document.

INTERPRET: Inter-Warp Register Reuse for GPU Tensor Core

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

INTERPRET: Inter-Warp Register Reuse for GPU Tensor Core

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?