Loading [a11y]/accessibility-menu.js
Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks | IEEE Conference Publication | IEEE Xplore

Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks


Abstract:

We propose Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks - an in-SRAM architecture for accelerating Convolutional Neural Network (CNN) inference...Show More

Abstract:

We propose Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks - an in-SRAM architecture for accelerating Convolutional Neural Network (CNN) inference by leveraging network redundancy and massive parallelism. The network redundancy is exploited in two ways. First, we prune and fine-tune the trained network model and develop two distinct methods - coalescing and overlapping - to run inferences efficiently with sparse models. Second, we propose an architecture for network models with a reduced bit width by leveraging bit-serial computation. Our proposed architecture achieves a 17.7×/3.7× speedup over server class CPU/GPU, and a 1.6× speedup compared to the relevant in-cache accelerator, with 2% area overhead each processor die, and no loss on top-1 accuracy for AlexNet. With a relaxed accuracy limit, our tunable architecture achieves higher speedups.
Date of Conference: 16-20 February 2019
Date Added to IEEE Xplore: 28 March 2019
ISBN Information:

ISSN Information:

Conference Location: Washington, DC, USA

Contact IEEE to Subscribe

References

References is not available for this document.