Abstract:
A 21mW low-power embedded Recurrent Neural Network (RNN) accelerator is proposed to realize the image captioning applications. The low-power RNN operation is achieved by ...Show MoreMetadata
Abstract:
A 21mW low-power embedded Recurrent Neural Network (RNN) accelerator is proposed to realize the image captioning applications. The low-power RNN operation is achieved by 3 key features: 1) Quantization-table-based matrix multiplication with RNN weight quantization, 2) Dynamic quantization-table allocation scheme for balanced pipelined RNN operation, and 3) Zero-skipped RNN operation using quantization-table. The Quantization table enables the 98% reduction of the multiplier operations by replacing the multiplication to the table reference. The dynamic quantization-table allocation is used to achieve high chip-utilization efficiency over 90% by balanced pipeline operation for three variations of the RNN operation. The zero-skipped RNN operation reduces the overall 27% of required external memory bandwidth and quantization-table operations without any additional hardware cost. The proposed RNN accelerator of 1.84mm2 achieves 21mW power consumption and demonstrates its functionality on the image captioning RNN in 65nm CMOS process.
Published in: 2017 IEEE Asian Solid-State Circuits Conference (A-SSCC)
Date of Conference: 06-08 November 2017
Date Added to IEEE Xplore: 28 December 2017
ISBN Information:
References is not available for this document.
Select All
1.
O. Vinyals, A. Toshev, S. Bengio and D. Erhan, “Show and tell: A neural image caption generator,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 3156–3164.
2.
Y. H. Chen, T. Krishna, J. S. Emer and V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” in IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, Jan. 2017.
3.
Chen, Tianshi, “Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning.” ACM Sigplan Notices. Vol. 49. No. 4. ACM, 2014.
4.
Hochreiter, Sepp, and Jurgen Schmidhuber. “Long short-term memory.” Neural computation 9. 8 ( 1997 ): 1735–1780.
5.
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. “Sequence to sequence learning with neural networks.” Advances in neural information processing systems. 2014.
6.
Shin, Dongjoo, “14.2 DNPU: An 8.1 TOPS/Wreconfigurable CNN-RNN processor for general-purpose deep neural networks.” Solid-State Circuits Conference (ISSCC), 2017 IEEE International. IEEE, 2017.
7.
Kaiyuan Guo, “From model to FPGA: Software-hardware co-design for efficient neural network acceleration,” 2016 IEEE Hot Chips 28 Symposium (HCS), Cupertino, CA, USA, 2016, pp. 1–2.
8.
Han, Song, “EIE: efficient inference engine on compressed deep neural network.” Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 2016.