Loading [a11y]/accessibility-menu.js
Bit-Serial Cache: Exploiting Input Bit Vector Repetition to Accelerate Bit-Serial Inference | IEEE Conference Publication | IEEE Xplore

Bit-Serial Cache: Exploiting Input Bit Vector Repetition to Accelerate Bit-Serial Inference


Abstract:

Bit-serial computation has demonstrated superiority in processing precision-varying DNNs by slicing multi-bit vectors into multiple single-bit vectors and computing the i...Show More

Abstract:

Bit-serial computation has demonstrated superiority in processing precision-varying DNNs by slicing multi-bit vectors into multiple single-bit vectors and computing the inner product using multiple steps of shift-and-adds. In this paper, we identify that performing real-world DNNs inference with bit-serial computation exhibits high input bit vector locality, where up to 85.7% of non-zero input bit vectors, as well as their associated computation, are previously-seen and previously-done ones. We propose Bit-Serial Cache to transfer the identified locality into performance and energy efficiency gains. The key design strategy is to store recently-computed partial sums of input bit vectors to a cache and utilize cache accesses to replace redundant computations. In addition to the bit-serial computation architecture, we also present: 1) request clustering and 2) interleaved scheduling, to further enhance the performance and energy efficiency.Our experiments using six popular DNNs (in both 8-b and 4-b) show that Bit-Serial Cache speeds up DNN inference by up to 2.72×, 1.82×, and 4.03×, energy efficiency by 3.19×, 3.29×, and 2.82×, area efficiency by 1.35×, 1.24×, and 2.76× over state-of-the-art Loom, DPRed Loom, and Laconic.
Date of Conference: 09-13 July 2023
Date Added to IEEE Xplore: 15 September 2023
ISBN Information:
Conference Location: San Francisco, CA, USA

Contact IEEE to Subscribe

References

References is not available for this document.