Fast convergence rates of deep neural networks for classification

Yongdai Kim; Ilsang Ohn; Dongha Kim

doi:10.1016/j.neunet.2021.02.012

Fast convergence rates of deep neural networks for classification

Neural Netw. 2021 Jun:138:179-197. doi: 10.1016/j.neunet.2021.02.012. Epub 2021 Feb 23.

Authors

Yongdai Kim¹, Ilsang Ohn², Dongha Kim³

Affiliations

¹ Department of Statistics and Department of Data Science, Seoul National University, Seoul 08826, Republic of Korea. Electronic address: ydkim0903@gmail.com.
² Department of Applied and Computational Mathematics and Statistics, The University of Notre Dame, Indiana 46530, USA.
³ Department of Statistics, Sungshin Women's University, Seoul 02844, Republic of Korea.

PMID: 33676328
DOI: 10.1016/j.neunet.2021.02.012

Abstract

We derive the fast convergence rates of a deep neural network (DNN) classifier with the rectified linear unit (ReLU) activation function learned using the hinge loss. We consider three cases for a true model: (1) a smooth decision boundary, (2) smooth conditional class probability, and (3) the margin condition (i.e., the probability of inputs near the decision boundary is small). We show that the DNN classifier learned using the hinge loss achieves fast rate convergences for all three cases provided that the architecture (i.e., the number of layers, number of nodes and sparsity) is carefully selected. An important implication is that DNN architectures are very flexible for use in various cases without much modification. In addition, we consider a DNN classifier learned by minimizing the cross-entropy, and show that the DNN classifier achieves a fast convergence rate under the conditions that the noise exponent and margin exponent are large. Even though they are strong, we explain that these two conditions are not too absurd for image classification problems. To confirm our theoretical explanation, we present the results of a small numerical study conducted to compare the hinge loss and cross-entropy.

Keywords: Classification; Deep neural network; Excess risk; Fast convergence rate.

MeSH terms

Classification / methods*
Entropy
Machine Learning*
Probability