Theory of adaptive SVD regularization for deep neural networks

Mohammad Mahdi Bejani; Mehdi Ghatee

doi:10.1016/j.neunet.2020.04.021

Theory of adaptive SVD regularization for deep neural networks

Neural Netw. 2020 Aug:128:33-46. doi: 10.1016/j.neunet.2020.04.021. Epub 2020 Apr 25.

Authors

Mohammad Mahdi Bejani¹, Mehdi Ghatee²

Affiliations

¹ Department of Computer Science, Faculty of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Iran. Electronic address: mbejani@aut.ac.ir.
² Department of Computer Science, Faculty of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Iran. Electronic address: ghatee@aut.ac.ir.

PMID: 32413786
DOI: 10.1016/j.neunet.2020.04.021

Abstract

Deep networks can learn complex problems, however, they suffer from overfitting. To solve this problem, regularization methods have been proposed that are not adaptable to the dynamic changes in the training process. With a different approach, this paper presents a regularization method based on the Singular Value Decomposition (SVD) that adjusts the learning model adaptively. To this end, the overfitting can be evaluated by condition numbers of the synaptic matrices. When the overfitting is high, the matrices are substituted with their SVD approximations. Some theoretical results are derived to show the performance of this regularization method. It is proved that SVD approximation cannot solve overfitting after several iterations. Thus, a new Tikhonov term is added to the loss function to converge the synaptic weights to the SVD approximation of the best-found results. Following this approach, an Adaptive SVD Regularization (ASR) is proposed to adjust the learning model with respect to the dynamic training characteristics. ASR results are visualized to show how ASR overcomes overfitting. The different configurations of Convolutional Neural Networks (CNN) are implemented with different augmentation schemes to compare ASR with state-of-the-art regularization methods. The results show that on MNIST, F-MNIST, SVHN, CIFAR-10 and CIFAR-100, the accuracies of ASR are 99.4%, 95.7%, 97.1%, 93.2% and 55.6%, respectively. Although ASR improves the overfitting and validation loss, its elapsed time is not significantly greater than the learning without regularization.

Keywords: Adaptive regularization; Deep networks; Matrix decomposition; Overfitting; Singular values decomposition.

MeSH terms

Neural Networks, Computer*