Loading web-font TeX/Main/Regular
A 65-nm Energy-Efficient Interframe Data Reuse Neural Network Accelerator for Video Applications | IEEE Journals & Magazine | IEEE Xplore

A 65-nm Energy-Efficient Interframe Data Reuse Neural Network Accelerator for Video Applications


Abstract:

An energy-efficient convolutional neural network (CNN) accelerator is proposed for the video application. Previous works exploited the sparsity of differential (Diff) fra...Show More

Abstract:

An energy-efficient convolutional neural network (CNN) accelerator is proposed for the video application. Previous works exploited the sparsity of differential (Diff) frame activation, but the improvement is limited as many Diff-frame data is small but non-zero. Processing of irregular sparse data also leads to low hardware utilization. To solve these problems, two key innovations are proposed in this article. First, we implement a hybrid-precision inter-frame-reuse architecture which takes advantage of both low bit-width and high sparsity of Diff-frame data. This technology can accelerate 3.2 \times inference speed with no accuracy loss. Second, we design a conv-pattern-aware processing array that achieves the 2.48 \times –14.2 \times PE utilization rate to process sparse data for different convolution kernels. The accelerator chip was implemented in 65-nm CMOS technology. To the best of our knowledge, it is the first silicon-proven CNN accelerator that supports inter-frame data reuse. Attributed to the inter-frame similarity, this video CNN accelerator reaches the minimum energy consumption of 24.7 \mu \text{J} /frame in the MobileNet-slim model, which is 76.3% less than the baseline.
Published in: IEEE Journal of Solid-State Circuits ( Volume: 57, Issue: 8, August 2022)
Page(s): 2574 - 2585
Date of Publication: 01 December 2021

ISSN Information:

Funding Agency:

References is not available for this document.

I. Introduction

Deep convolutional neural networks (CNNs) have made a revolutionary breakthrough in the computer vision area. Beside image processing, CNNs are widely used in video tasks, including object tracking [1], [2], video classification [3], and action recognition [4]. CNN models for video [5]–[7] commonly have millions of parameters and billions of operations, which results in long latency time and low battery life on the resource-constrained system.

Select All
1.
L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, “Fully-convolutional Siamese networks for object tracking,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 850–865.
2.
B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, “High performance visual tracking with Siamese region proposal network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 8971–8980.
3.
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1725–1732.
4.
L. Wang, “Temporal segment networks for action recognition in videos,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 11, pp. 2740–2755, Nov. 2019.
5.
K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 568–576.
6.
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” in Proc. ICCV, 2015, pp. 4489–4497.
7.
J. Carreira and A. Zisserman, “Quo Vadis, action recognition? A new model and the kinetics dataset,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 6299–6308.
8.
S. Zhang, “Cambricon-X: An accelerator for sparse neural networks,” in Proc. 49th Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO), Oct. 2016, pp. 1–12.
9.
Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks,” in Proc. Int. Symp. Comput. Architecture, 2016, pp. 367–379.
10.
B. Moons and M. Verhelst, “A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Jun. 2016, pp. 1–2.
11.
J. Yue, “A 3.77 TOPS/W convolutional neural network processor with priority-driven kernel optimization,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 66, no. 2, pp. 277–281, Feb. 2019.
12.
D. Shin, J. Lee, J. Lee, and H. Yoo, “14.2 DNPU: An 8.1 TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2017, pp. 240–241.
13.
J. Lee, “UNPU: A 50.6 TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2018, pp. 218–220.
14.
S. Yin, “A high energy efficient reconfigurable hybrid neural network processor for deep learning applications,” IEEE J. Solid-State Circuits, vol. 53, no. 4, pp. 968–982, Apr. 2018.
15.
B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, “14.5 envision: A 0.26-to-10 TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2017, pp. 246–247.
16.
Z. Yuan, “A sparse-adaptive CNN processor with area/performance balanced N-way set-associate PE arrays assisted by a collision-aware scheduler,” in Proc. Asian Solid-State Circuits Conf., 2019, pp. 61–64.
17.
S. Han, “EIE: Efficient inference engine on compressed deep neural network,” in Proc. ACM/IEEE 43rd Annu. Int. Symp. Comput. Architecture, Jun. 2016, pp. 243–254.
18.
J. Albericio, “Cnvlutin: Ineffectual-neuron-free deep neural network computing,” in Proc. Int. Symp. Comput. Architecture, Jun. 2016, pp. 1–13.
19.
A. Parashar, “SCNN: An accelerator for compressed-sparse convolutional neural networks,” ACM SIGARCH Comput. Archit. News, vol. 45, no. 2, pp. 27–40, 2017.
20.
Z. Yuan, “Sticker: A 0.41–62.1 TOPS/W 8Bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers,” in Proc. IEEE Symp. VLSI Circuits, Jun. 2018, pp. 33–34.
21.
J. Song, “An 11.5 TOPS/W 1024-MAC butterfly structure dual-core sparsity-aware neural processing unit in 8 nm flagship mobile SoC,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2019, pp. 130–132.
22.
J. Yue, “A 65 nm computing-in-memory-based CNN processor with 2.9-to-35.8 TOPS/W system energy efficiency using dynamic-sparsity performance-scaling architecture and energy-efficient inter/intra-macro data reuse,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2020, pp. 234–236.
23.
J. Yue, “A 65 nm 0.39-to-140.3 TOPS/W 1-to-12b unified neural network processor using block-circulant-enabled transpose-domain acceleration with 8.1 × higher tops/mm² and 6T HBST-TRAM-based 2D data-reuse architecture,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2019, pp. 138–140.
24.
M. Mahmoud, K. Siu, and A. Moshovos, “Diffy: A Déjávu-free differential deep neural network accelerator,” in Proc. 51st Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO), Oct. 2018, pp. 134–147.
25.
Z. Zhang and V. Sze, “FAST: A framework to accelerate super-resolution processing on compressed videos,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 19–28.
26.
Y. Zhu, A. Samajdar, M. Mattina, and P. Whatmough, “Euphrates: Algorithm-SoC co-design for low-power mobile continuous vision,” in Proc. ACM/IEEE 45th Annu. Int. Symp. Comput. Architecture (ISCA), Jun. 2018, pp. 547–560.
27.
B. Pan, W. Lin, X. Fang, C. Huang, B. Zhou, and C. Lu, “Recurrent residual module for fast inference in videos,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1536–1545.
28.
M. Riera, J.-M. Arnau, and A. González, “Computation reuse in DNNs by exploiting input similarity,” in Proc. ACM/IEEE 45th Annu. Int. Symp. Comput. Architecture, Jun. 2018, pp. 57–68.
29.
M. Buckler, P. Bedoukian, S. Jayasuriya, and A. Sampson, “EVA²: Exploiting temporal redundancy in live computer vision,” in Proc. ACM/IEEE 45th Annu. Int. Symp. Comput. Architecture (ISCA), Jun. 2018, pp. 533–546.
30.
J. Wang, “GAAS: An efficient group associated architecture and scheduler module for sparse CNN accelerators,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 39, no. 12, pp. 5170–5182, Dec. 2020.

Contact IEEE to Subscribe

References

References is not available for this document.