Journals & Magazines >IEEE Transactions on Computer... >Volume: 43 Issue: 11

FlexBCM: Hybrid Block-Circulant Neural Network and Accelerator Co-Search on FPGAs

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Block-circulant matrix (BCM) compression has garnered much attention in the hardware acceleration of convolutional neural networks (CNNs) due to its regularity and effici...Show More

Metadata

Abstract:

Block-circulant matrix (BCM) compression has garnered much attention in the hardware acceleration of convolutional neural networks (CNNs) due to its regularity and efficiency. However, constrained by the difficulty of exploring the compression parameter space, existing BCM-based methods often apply a uniform compression parameter to all CNN models’ layers, losing the compression’s flexibility. Additionally, independently optimizing models or accelerators makes achieving the optimal tradeoff between model accuracy and hardware efficiency challenging. To this end, we propose FlexBCM, a joint exploration framework that efficiently explores both the parameter compression and hardware parameter space to generate customized hybrid BCM-compressed CNN and field-programmable gate array (FPGA) accelerator solutions. On the algorithmic side, leveraging the idea of neural architecture search (NAS), we design an efficient differentiable sampling method to rapidly evaluate the accuracy of candidate subnets. Additionally, we devise a hardware-friendly frequency domain quantization scheme for BCM computation. On the hardware side, we develop the efficient and parameter-configurable convolutional core (ConvPU) alongside the BCM computing core (BCMPU). The BCMPU can flexibly accommodate different compression parameters at runtime, incorporate complex-number DSP packing and conjugate symmetry optimizations. For model-to-hardware evaluation, we construct accurate latency and resource consumption models. Moreover, we design a fast hardware generation algorithm based on the coarse-grained search to provide prompt feedback on the hardware evaluation of the current subnet. Finally, we validate FlexBCM on the Xilinx ZCU102 FPGA and compare its compressed CNN-accelerator solutions with previous state-of-the-art works. Experimental results demonstrate that FlexBCM achieves 1.21–3.02 times higher-computational efficiency for ResNet18 and ResNet34 models while maintaining an acceptable accuracy...

Published in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( Volume: 43, Issue: 11, November 2024)

Page(s): 3852 - 3863

Date of Publication: 06 November 2024

ISSN Information:

DOI: 10.1109/TCAD.2024.3439488

Funding Agency:

Wenqi Lou

School of Software Engineering, University of Science and Technology of China, Hefei, China

Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China

Yunji Qin

School of Computer Science, University of Science and Technology of China, Hefei, China

Xuan Wang

School of Computer Science, University of Science and Technology of China, Hefei, China

Lei Gong

School of Computer Science, University of Science and Technology of China, Hefei, China

Chao Wang

School of Computer Science, University of Science and Technology of China, Hefei, China

Xuehai Zhou

School of Computer Science, University of Science and Technology of China, Hefei, China

Contents

Wenqi Lou

School of Software Engineering, University of Science and Technology of China, Hefei, China

Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China

Yunji Qin

School of Computer Science, University of Science and Technology of China, Hefei, China

Xuan Wang

School of Computer Science, University of Science and Technology of China, Hefei, China

Lei Gong

School of Computer Science, University of Science and Technology of China, Hefei, China

Chao Wang

School of Computer Science, University of Science and Technology of China, Hefei, China

Xuehai Zhou

School of Computer Science, University of Science and Technology of China, Hefei, China

References is not available for this document.

FlexBCM: Hybrid Block-Circulant Neural Network and Accelerator Co-Search on FPGAs

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

FlexBCM: Hybrid Block-Circulant Neural Network and Accelerator Co-Search on FPGAs

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?