Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks | IEEE Conference Publication | IEEE Xplore

Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks


Abstract:

Exploiting model sparsity to reduce ineffectual computation is a commonly used approach to achieve energy efficiency for DNN inference accelerators. However, due to the t...Show More

Abstract:

Exploiting model sparsity to reduce ineffectual computation is a commonly used approach to achieve energy efficiency for DNN inference accelerators. However, due to the tightly coupled crossbar structure, exploiting sparsity for ReRAM-based NN accelerator is a less explored area. Existing architectural studies on ReRAM-based NN accelerators assume that an entire crossbar array can be activated in a single cycle. However, due to inference accuracy considerations, matrix-vector computation must be conducted in a smaller granularity in practice, called Operation Unit (OU). An OU-based architecture creates a new opportunity to exploit DNN sparsity. In this paper, we propose the first practical Sparse ReRAM Engine that exploits both weight and activation sparsity. Our evaluation shows that the proposed method is effective in eliminating ineffectual computation, and delivers significant performance improvement and energy savings.
Date of Conference: 22-26 June 2019
Date Added to IEEE Xplore: 06 February 2020
ISBN Information:

ISSN Information:

Conference Location: Phoenix, AZ, USA

References

References is not available for this document.