Consideration on the learning efficiency of multiple-layered neural networks with linear units

Miki Aoyagi

doi:10.1016/j.neunet.2024.106132

Consideration on the learning efficiency of multiple-layered neural networks with linear units

Neural Netw. 2024 Apr:172:106132. doi: 10.1016/j.neunet.2024.106132. Epub 2024 Jan 17.

Author

Miki Aoyagi¹

Affiliation

¹ College of Science & Technology, Nihon University, 1-8-14, Surugadai, Kanda, Chiyoda-ku, Tokyo 101-8308, Japan. Electronic address: aoyagi.miki@nihon-u.ac.jp.

PMID: 38278091
DOI: 10.1016/j.neunet.2024.106132

Abstract

In the last two decades, remarkable progress has been done in singular learning machine theories on the basis of algebraic geometry. These theories reveal that we need to find resolution maps of singularities for analyzing asymptotic behavior of state probability functions when the number of data increases. In particular, it is essential to construct normal crossing divisors of average log loss functions. However, there are few examples for obtaining these for singular models. In this paper, we determine the resolution map and normal crossing divisors for multiple-layered neural networks with linear units. Moreover, we have the exact values for the learning efficiency, which is so called learning coefficients. Multiple-layered neural networks with linear units are simple, however, very important models because these models give the essential information from data of input-output pairs. Moreover, these models are very close to multiple-layered neural networks with rectified linear units (ReLU). We show the learning coefficients of multiple-layered neural networks with linear units are bounded even though the number of layers goes to infinity, which means that the main term of asymptotic expansion of the free energy and generalization error of singular models are much smaller than the dimension of its parameter space.

Keywords: Algebraic geometry; Multiple-layered neural networks with linear units; Resolution map; Singular learning theory.

MeSH terms

Generalization, Psychological*
Likelihood Functions
Neural Networks, Computer*