I. Introduction
Recent years have witnessed the superiority and persistent improvement of convolutional neural networks (CNNs) in many applications, such as computer vision and natural language processing [1]. However, the high accuracy of modern CNNs comes with many weights, leading to massive data movements between memory and computing units. The so-called von Neumann bottleneck causes enormous energy consumption. It also hinders the applicability of CNNs on conventional hardware platforms and resource-constrained devices.