Conferences >2023 IEEE International Sympo...

Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Machine learning models with various tensor operators are becoming ubiquitous in recent years. There are two types of operators in machine learning: compute-intensive ope...Show More

Metadata

Abstract:

Machine learning models with various tensor operators are becoming ubiquitous in recent years. There are two types of operators in machine learning: compute-intensive operators (e.g., GEMM and convolution) and memory-intensive operators (e.g., ReLU and softmax). In emerging machine learning models, compute-intensive operators are usually organized in a chain structure. With the continual specialization of hardware, the gap between computing performance and memory bandwidth has become more prominent. Consequently, the implementations of many compute-intensive operator chains are bounded by memory bandwidth, and generating fused kernels to improve locality for these compute-intensive operators becomes necessary. But in existing machine learning compilers, there lack both precise analysis and efficient optimization for compute-intensive operator chains on different accelerators. As a result, they usually produce sub-optimal performance for these operator chains.In this paper, we propose Chimera, an optimizing framework that can efficiently improve the locality of compute-intensive operator chains on different hardware accelerators. In Chimera, each compute-intensive operator is composed of a series of computation blocks. To generate efficient fused kernels for the operator chains, optimizations for both inter-block and intra-block are required. For inter-block optimization, Chimera decides the optimized block execution order by minimizing the data movement volume among blocks using an analytical model. For intra-block optimization, Chimera uses unified replaceable micro kernels to apply hardware-specific optimizations for different accelerators. Finally, Chimera generates fused kernels for compute-intensive operator chains. Evaluation of batch GEMM chains and convolution chains on CPU, GPU, and NPU shows that Chimera achieves up to 2.87×, 2.29×, and 2.39× speedups to hand-tuned libraries. Compared to state-of-the-art compilers, the speedups are up to 2.29×, 1.64×, and ...

Published in: 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Date of Conference: 25 February 2023 - 01 March 2023

Date Added to IEEE Xplore: 24 March 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/HPCA56546.2023.10071018

Conference Location: Montreal, QC, Canada

Funding Agency:

Contents

References is not available for this document.

Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?