Towards performance-maximizing neural network pruning via global channel attention

Yingchun Wang; Song Guo; Jingcai Guo; Jie Zhang; Weizhan Zhang; Caixia Yan; Yuanhong Zhang

doi:10.1016/j.neunet.2023.11.065

Towards performance-maximizing neural network pruning via global channel attention

Neural Netw. 2024 Mar:171:104-113. doi: 10.1016/j.neunet.2023.11.065. Epub 2023 Dec 1.

Authors

Yingchun Wang¹, Song Guo², Jingcai Guo³, Jie Zhang⁴, Weizhan Zhang⁵, Caixia Yan⁶, Yuanhong Zhang⁶

Affiliations

¹ BDKE Lab, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China; Department of Computing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China. Electronic address: 20116342r@connect.polyu.hk.
² Department of Computing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China. Electronic address: song.guo@polyu.edu.hk.
³ Department of Computing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China. Electronic address: jc-jingcai.guo@polyu.edu.hk.
⁴ Department of Computing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China. Electronic address: 18104473r@connect.polyu.hk.
⁵ BDKE Lab, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China. Electronic address: zhangwzh@xjtu.edu.cn.
⁶ BDKE Lab, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China.

PMID: 38091754
DOI: 10.1016/j.neunet.2023.11.065

Abstract

Network pruning has attracted increasing attention recently for its capability of transferring large-scale neural networks (e.g., CNNs) into resource-constrained devices. Such a transfer is typically achieved by removing redundant network parameters while retaining its generalization performance in a static or dynamic manner. Concretely, static pruning usually maintains a larger and fit-to-all (samples) compressed network by removing the same channels for all samples, which cannot maximally excavate redundancy in the given network. In contrast, dynamic pruning can adaptively remove (more) different channels for different samples and obtain state-of-the-art performance along with a higher compression ratio. However, since the system has to preserve the complete network information for sample-specific pruning, the dynamic pruning methods are usually not memory-efficient. In this paper, our interest is to explore a static alternative, dubbed GlobalPru, from a different perspective by respecting the differences among data. Specifically, a novel channel attention-based learn-to-rank framework is proposed to learn a global ranking of channels with respect to network redundancy. In this method, each sample-wise (local) channel attention is forced to reach an agreement on the global ranking among different data. Hence, all samples can empirically share the same ranking of channels and make the pruning statically in practice. Extensive experiments on ImageNet, SVHN, and CIFAR-10/100 demonstrate that the proposed GlobalPru achieves superior performance than state-of-the-art static and dynamic pruning methods by significant margins.

Keywords: Channel pruning; Edge computing; Global attention; Learn-to-rank; Model compression.

MeSH terms

Data Compression*
Generalization, Psychological
Learning
Neural Networks, Computer