Journals & Magazines >IEEE Transactions on Neural N... >Volume: 34 Issue: 12

Model Compression Based on Differentiable Network Channel Pruning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Although neural networks have achieved great success in various fields, applications on mobile devices are limited by the computational and storage costs required for lar...Show More

Metadata

Abstract:

Although neural networks have achieved great success in various fields, applications on mobile devices are limited by the computational and storage costs required for large models. The model compression (neural network pruning) technology can significantly reduce network parameters and improve computational efficiency. In this article, we propose a differentiable network channel pruning (DNCP) method for model compression. Unlike existing methods that require sampling and evaluation of a large number of substructures, our method can efficiently search for optimal substructure that meets resource constraints (e.g., FLOPs) through gradient descent. Specifically, we assign a learnable probability to each possible number of channels in each layer of the network, relax the selection of a particular number of channels to a softmax over all possible numbers of channels, and optimize the learnable probability in an end-to-end manner through gradient descent. After the network parameters are optimized, we prune the network according to the learnable probability to obtain the optimal substructure. To demonstrate the effectiveness and efficiency of DNCP, experiments are conducted with ResNet and MobileNet V2 on CIFAR, Tiny ImageNet, and ImageNet datasets.

Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 34, Issue: 12, December 2023)

Page(s): 10203 - 10212

Date of Publication: 15 April 2022

ISSN Information:

PubMed ID: 35427225

DOI: 10.1109/TNNLS.2022.3165123

Funding Agency:

Contents

I. Introduction

In the application of computer vision including image classification [1]–[4], object detection [5]–[8], semantic segmentation [9]–[12] and so on, deep neural networks have achieved great success. However, these deep neural networks require huge computing and storage costs yet, which severely hinder their applications under resource constraints. There are many model compression methods, such as pruning [13]–[26], quantization [27]–[29], lightweight network design [30]–[33], and so on. Among them, pruning is a widely recognized efficient network compression and acceleration method, which significantly improves the inference efficiency of the network by subtracting unimportant weights or channels.

References is not available for this document.

Model Compression Based on Differentiable Network Channel Pruning

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Model Compression Based on Differentiable Network Channel Pruning

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?