Loading web-font TeX/Main/Regular
A Flexible and Efficient FPGA Accelerator for Various Large-Scale and Lightweight CNNs | IEEE Journals & Magazine | IEEE Xplore

A Flexible and Efficient FPGA Accelerator for Various Large-Scale and Lightweight CNNs


Abstract:

To enable efficient deployment of convolutional neural networks (CNNs) on embedded platforms for different computer vision applications, several convolution variants have...Show More

Abstract:

To enable efficient deployment of convolutional neural networks (CNNs) on embedded platforms for different computer vision applications, several convolution variants have been introduced, such as depthwise convolution (DWCV), transposed convolution (TPCV), and dilated convolution (DLCV). To address the utilization degradation issue occurred in a general convolution engine for these emerging operators, a highly flexible and reconfigurable hardware accelerator is proposed to efficiently support various CNN-based vision tasks. Firstly, to avoid workload imbalance of TPCV, a zero transfer and skipping (ZTS) method is proposed to reorganize the computation process. To eliminate the redundant zero calculations of TPCV and DLCV, a sparsity-alike processing (SAP) method is proposed based on weight-oriented dataflow. Secondly, the DWCV or pooling layers are configured to be directly executed after standard convolutions without external memory accesses. Furthermore, a programmable execution schedule is introduced to gain better flexibility. Finally, the proposed accelerator is evaluated on Intel Arria 10 SoC FPGA. Experimental results show state-of-the-art performance on both large-scale and lightweight CNNs for image segmentation or classification. Specifically, the accelerator can achieve a processing speed up to 339.9 FPS and computational efficiency up to 0.58 GOPS/DSP, which is 3.3\times better than the prior art evaluated on the same network.
Page(s): 1185 - 1198
Date of Publication: 07 December 2021

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.