Loading [MathJax]/extensions/MathMenu.js
Model Compression Based on Differentiable Network Channel Pruning | IEEE Journals & Magazine | IEEE Xplore

Model Compression Based on Differentiable Network Channel Pruning


Abstract:

Although neural networks have achieved great success in various fields, applications on mobile devices are limited by the computational and storage costs required for lar...Show More

Abstract:

Although neural networks have achieved great success in various fields, applications on mobile devices are limited by the computational and storage costs required for large models. The model compression (neural network pruning) technology can significantly reduce network parameters and improve computational efficiency. In this article, we propose a differentiable network channel pruning (DNCP) method for model compression. Unlike existing methods that require sampling and evaluation of a large number of substructures, our method can efficiently search for optimal substructure that meets resource constraints (e.g., FLOPs) through gradient descent. Specifically, we assign a learnable probability to each possible number of channels in each layer of the network, relax the selection of a particular number of channels to a softmax over all possible numbers of channels, and optimize the learnable probability in an end-to-end manner through gradient descent. After the network parameters are optimized, we prune the network according to the learnable probability to obtain the optimal substructure. To demonstrate the effectiveness and efficiency of DNCP, experiments are conducted with ResNet and MobileNet V2 on CIFAR, Tiny ImageNet, and ImageNet datasets.
Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 34, Issue: 12, December 2023)
Page(s): 10203 - 10212
Date of Publication: 15 April 2022

ISSN Information:

PubMed ID: 35427225

Funding Agency:


I. Introduction

In the application of computer vision including image classification [1]–[4], object detection [5]–[8], semantic segmentation [9]–[12] and so on, deep neural networks have achieved great success. However, these deep neural networks require huge computing and storage costs yet, which severely hinder their applications under resource constraints. There are many model compression methods, such as pruning [13]–[26], quantization [27]–[29], lightweight network design [30]–[33], and so on. Among them, pruning is a widely recognized efficient network compression and acceleration method, which significantly improves the inference efficiency of the network by subtracting unimportant weights or channels.

Contact IEEE to Subscribe

References

References is not available for this document.