Knowledge Transfer via Decomposing Essential Information in Convolutional Neural Networks

IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):366-377. doi: 10.1109/TNNLS.2020.3027837. Epub 2022 Jan 5.

Abstract

Knowledge distillation (KD) from a "teacher" neural network and transfer of the knowledge to a small student network is done to improve the performance of the student network. This method is one of the most popular techniques to lighten convolutional neural networks (CNNs). Many KD algorithms have been proposed recently, but they still cannot properly distill essential knowledge of the teacher network, and the transfer tends to depend on the spatial shape of the teacher's feature map. To solve these problems, we propose a method to transfer knowledge independently of the spatial shape of the teacher's feature map, which is major information obtained by decomposing the feature map through singular value decomposition (SVD). In addition, we present a multitask learning method that enables the student to learn the teacher's knowledge effectively by adaptively adjusting the teacher's constraints to the student's learning speed. Experimental results show that the proposed method performs 2.37% better on the CIFAR100 data set and 2.89% better on the TinyImageNet data set than the state-of-the-art method. The source code is publicly available at https://github.com/sseung0703/KD_methods_with_TF.

Publication types

  • Research Support, Non-U.S. Gov't