Abstract:
Convolution neural networks (CNNs) have been implemented with custom hardware on edge devices since its algorithms were successful in many artificial intelligence applica...Show MoreMetadata
Abstract:
Convolution neural networks (CNNs) have been implemented with custom hardware on edge devices since its algorithms were successful in many artificial intelligence applications. Although lots of unstructured pruning and mix-bit quantization algorithms have been proposed to successfully compress CNNs, there are few hardware accelerators which can support both sparse and mix-bit CNNs. Besides, sparse matrix computation consumes lots of hardware resources such as registers or BRAM to fetch the needed input activations into processing element (PE). This brief presents a tiny accelerator for mixed-bit sparse CNNs featuring a novel scheme of single vector-based compressed sparse filter (CSF) method and single input multiple output scratch pad (SIMO SPad) to effectively compress weight and fetch the needed input activation. SIMO SPad is shared by multiple PEs, which saves 13.34% CLB LUTs, 46.24% CLB Registers. Furthermore, the accelerator supports mixed-bit sparse computation to obtain better accuracy and performance. When tested on VGG16, compared with 8-bit non-sparsity baselines, the performance of mixed-bit sparsity on Cifar10 and ImageNet improved by 4.85\times and 3.33\times , respectively, with small accuracy decrease degradation. Compared to state-of-the-art accelerators, the accelerator achieves 1.40\times to 2.98\times greater DSP efficiency, and offers 1.91\times greater energy efficiency.
Published in: IEEE Transactions on Circuits and Systems II: Express Briefs ( Volume: 70, Issue: 8, August 2023)