Towards performance-maximizing neural network pruning via global channel attention

Neural Netw. 2024 Mar:171:104-113. doi: 10.1016/j.neunet.2023.11.065. Epub 2023 Dec 1.

Abstract

Network pruning has attracted increasing attention recently for its capability of transferring large-scale neural networks (e.g., CNNs) into resource-constrained devices. Such a transfer is typically achieved by removing redundant network parameters while retaining its generalization performance in a static or dynamic manner. Concretely, static pruning usually maintains a larger and fit-to-all (samples) compressed network by removing the same channels for all samples, which cannot maximally excavate redundancy in the given network. In contrast, dynamic pruning can adaptively remove (more) different channels for different samples and obtain state-of-the-art performance along with a higher compression ratio. However, since the system has to preserve the complete network information for sample-specific pruning, the dynamic pruning methods are usually not memory-efficient. In this paper, our interest is to explore a static alternative, dubbed GlobalPru, from a different perspective by respecting the differences among data. Specifically, a novel channel attention-based learn-to-rank framework is proposed to learn a global ranking of channels with respect to network redundancy. In this method, each sample-wise (local) channel attention is forced to reach an agreement on the global ranking among different data. Hence, all samples can empirically share the same ranking of channels and make the pruning statically in practice. Extensive experiments on ImageNet, SVHN, and CIFAR-10/100 demonstrate that the proposed GlobalPru achieves superior performance than state-of-the-art static and dynamic pruning methods by significant margins.

Keywords: Channel pruning; Edge computing; Global attention; Learn-to-rank; Model compression.

MeSH terms

  • Data Compression*
  • Generalization, Psychological
  • Learning
  • Neural Networks, Computer