Model Compression Based on Differentiable Network Channel Pruning

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10203-10212. doi: 10.1109/TNNLS.2022.3165123. Epub 2023 Nov 30.

Abstract

Although neural networks have achieved great success in various fields, applications on mobile devices are limited by the computational and storage costs required for large models. The model compression (neural network pruning) technology can significantly reduce network parameters and improve computational efficiency. In this article, we propose a differentiable network channel pruning (DNCP) method for model compression. Unlike existing methods that require sampling and evaluation of a large number of substructures, our method can efficiently search for optimal substructure that meets resource constraints (e.g., FLOPs) through gradient descent. Specifically, we assign a learnable probability to each possible number of channels in each layer of the network, relax the selection of a particular number of channels to a softmax over all possible numbers of channels, and optimize the learnable probability in an end-to-end manner through gradient descent. After the network parameters are optimized, we prune the network according to the learnable probability to obtain the optimal substructure. To demonstrate the effectiveness and efficiency of DNCP, experiments are conducted with ResNet and MobileNet V2 on CIFAR, Tiny ImageNet, and ImageNet datasets.