FlexBCM: Hybrid Block-Circulant Neural Network and Accelerator Co-Search on FPGAs | IEEE Journals & Magazine | IEEE Xplore

FlexBCM: Hybrid Block-Circulant Neural Network and Accelerator Co-Search on FPGAs


Abstract:

Block-circulant matrix (BCM) compression has garnered much attention in the hardware acceleration of convolutional neural networks (CNNs) due to its regularity and effici...Show More

Abstract:

Block-circulant matrix (BCM) compression has garnered much attention in the hardware acceleration of convolutional neural networks (CNNs) due to its regularity and efficiency. However, constrained by the difficulty of exploring the compression parameter space, existing BCM-based methods often apply a uniform compression parameter to all CNN models’ layers, losing the compression’s flexibility. Additionally, independently optimizing models or accelerators makes achieving the optimal tradeoff between model accuracy and hardware efficiency challenging. To this end, we propose FlexBCM, a joint exploration framework that efficiently explores both the parameter compression and hardware parameter space to generate customized hybrid BCM-compressed CNN and field-programmable gate array (FPGA) accelerator solutions. On the algorithmic side, leveraging the idea of neural architecture search (NAS), we design an efficient differentiable sampling method to rapidly evaluate the accuracy of candidate subnets. Additionally, we devise a hardware-friendly frequency domain quantization scheme for BCM computation. On the hardware side, we develop the efficient and parameter-configurable convolutional core (ConvPU) alongside the BCM computing core (BCMPU). The BCMPU can flexibly accommodate different compression parameters at runtime, incorporate complex-number DSP packing and conjugate symmetry optimizations. For model-to-hardware evaluation, we construct accurate latency and resource consumption models. Moreover, we design a fast hardware generation algorithm based on the coarse-grained search to provide prompt feedback on the hardware evaluation of the current subnet. Finally, we validate FlexBCM on the Xilinx ZCU102 FPGA and compare its compressed CNN-accelerator solutions with previous state-of-the-art works. Experimental results demonstrate that FlexBCM achieves 1.21–3.02 times higher-computational efficiency for ResNet18 and ResNet34 models while maintaining an acceptable accuracy...
Page(s): 3852 - 3863
Date of Publication: 06 November 2024

ISSN Information:

Funding Agency:

School of Software Engineering, University of Science and Technology of China, Hefei, China
Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China
School of Computer Science, University of Science and Technology of China, Hefei, China
School of Computer Science, University of Science and Technology of China, Hefei, China
School of Computer Science, University of Science and Technology of China, Hefei, China
School of Computer Science, University of Science and Technology of China, Hefei, China
School of Computer Science, University of Science and Technology of China, Hefei, China

School of Software Engineering, University of Science and Technology of China, Hefei, China
Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China
School of Computer Science, University of Science and Technology of China, Hefei, China
School of Computer Science, University of Science and Technology of China, Hefei, China
School of Computer Science, University of Science and Technology of China, Hefei, China
School of Computer Science, University of Science and Technology of China, Hefei, China
School of Computer Science, University of Science and Technology of China, Hefei, China
Contact IEEE to Subscribe

References

References is not available for this document.