Towards efficient network compression via Few-Shot Slimming

Junjie He; Yinzhang Ding; Ming Zhang; Dongxiao Li

doi:10.1016/j.neunet.2021.12.011

Towards efficient network compression via Few-Shot Slimming

Neural Netw. 2022 Mar:147:113-125. doi: 10.1016/j.neunet.2021.12.011. Epub 2021 Dec 24.

Authors

Junjie He¹, Yinzhang Ding², Ming Zhang¹, Dongxiao Li³

Affiliations

¹ College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310007, China; Zhejiang Provincial Key Laboratory of Information Processing, Communication and Networking, Hangzhou, 310007, China.
² DAMO Academy, Alibaba Group, Hangzhou, 311121, China.
³ College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310007, China; Zhejiang Provincial Key Laboratory of Information Processing, Communication and Networking, Hangzhou, 310007, China. Electronic address: lidx@zju.edu.cn.

PMID: 34999388
DOI: 10.1016/j.neunet.2021.12.011

Abstract

While previous network compression methods achieve great success, most of them rely on the abundant training data which is, unfortunately, often unavailable in practice due to some reasons, e.g., privacy issues, storage constraints, and transmission limitations. A promising way to solve this problem is to perform compression with a few unlabeled data. Proceeding along this way, we propose a novel few-shot network compression framework named Few-Shot Slimming (FSS). FSS follows the student/teacher paradigm, and contains two steps: (1) construct the student by inheriting principal feature maps from the teacher; (2) refine the student feature representation by knowledge distillation with an enhanced mixing data augmentation method called GridMix. Specifically, in the first step, we employ normalized cross correlation to perform the principal feature analysis, and then theoretically construct a new indicator to select the most informative feature maps from the teacher for the student. The indicator is based on the variances of feature maps which can efficiently quantitate the information richness of the input feature maps in a feature-agnostic manner. In the second step, we perform the knowledge distillation for the initialized student in first step with a novel grid-based mixing data augmentation technique which greatly extends the limited sample dataset. In this way, the student is able to refine its feature representation and achieves a better result. Extensive experiments on multiple benchmarks demonstrate the state-of-the-art performance of FSS. For example, by using 0.2% label-free data of full training set, FSS yields a 60% FLOPs reduction for DenseNet-40 on CIFAR-10 with only a loss of 0.8% in top-1 accuracy, achieving a result on par with that obtained by the conventional full-data methods.

Keywords: Few-shot compression; Knowledge distillation; Network compression.

MeSH terms

Humans
Knowledge*