I. Introduction
In the application of computer vision including image classification [1]–[4], object detection [5]–[8], semantic segmentation [9]–[12] and so on, deep neural networks have achieved great success. However, these deep neural networks require huge computing and storage costs yet, which severely hinder their applications under resource constraints. There are many model compression methods, such as pruning [13]–[26], quantization [27]–[29], lightweight network design [30]–[33], and so on. Among them, pruning is a widely recognized efficient network compression and acceleration method, which significantly improves the inference efficiency of the network by subtracting unimportant weights or channels.