Abstract:
The wide adoption of convolutional neural networks (CNNs) in many applications has given rise to unrelenting computational demand and memory requirements. Computing-in-Me...Show MoreMetadata
Abstract:
The wide adoption of convolutional neural networks (CNNs) in many applications has given rise to unrelenting computational demand and memory requirements. Computing-in-Memory (CIM) architecture has demonstrated great potential to break the memory wall and effectively execute CNN workloads. Ongoing research focuses on pruning or quantizing CNNs to achieve higher efficiency on CIM. However, prior works preclude the possibility of integrating both techniques in the same framework. On the other hand, directly incorporating energy estimation during the model compression process has not been well explored in the literature. In this paper, we present an Energy-aware Unified Pruning-Q uantization (E-UPQ) mechanism, a novel framework for automated compression (pruning + quantization) of CNNs while considering the energy-accuracy trade-off. Specifically, E-UPQ interweaves pruning and quantization seamlessly by viewing pruning as a special case of “0-bit” quantization during the mixed-precision search. In addition, E-UPQ introduces a set of trainable parameters to incorporate energy information during the compression process, closing the gap between compression policy and energy optimization. Experimental results evaluated on DNN+NeuroSim show that E-UPQ reduces energy consumption by up to 79.3% and 66.6% for VGG-16 and ResNet-18, respectively, compared with the state-of-the-art work, while achieving similar accuracy on CIFAR-100. Layer-wise analysis and ablation studies are provided to validate the effectiveness of the E-UPQ. We also present the corresponding CIM architecture to support the proposed E-UPQ framework.
Published in: IEEE Journal on Emerging and Selected Topics in Circuits and Systems ( Volume: 13, Issue: 1, March 2023)