I. Introduction
Deep convolutional neural networks (CNNs) have made a revolutionary breakthrough in the computer vision area. Beside image processing, CNNs are widely used in video tasks, including object tracking [1], [2], video classification [3], and action recognition [4]. CNN models for video [5]–[7] commonly have millions of parameters and billions of operations, which results in long latency time and low battery life on the resource-constrained system.