The training process of many deep networks explores the same low-dimensional manifold

Jialin Mao; Itay Griniasty; Han Kheng Teoh; Rahul Ramesh; Rubing Yang; Mark K Transtrum; James P Sethna; Pratik Chaudhari

doi:10.1073/pnas.2310002121

The training process of many deep networks explores the same low-dimensional manifold

Proc Natl Acad Sci U S A. 2024 Mar 19;121(12):e2310002121. doi: 10.1073/pnas.2310002121. Epub 2024 Mar 12.

Authors

Jialin Mao¹, Itay Griniasty², Han Kheng Teoh², Rahul Ramesh³, Rubing Yang¹, Mark K Transtrum⁴, James P Sethna², Pratik Chaudhari⁵

Affiliations

¹ Applied Mathematics and Computational Sciences, University of Pennsylvania, Philadelphia, PA 19104.
² Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY 14853.
³ Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104.
⁴ Department of Physics and Astronomy, Brigham Young University, Provo, UT 84604.
⁵ Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104.

PMID: 38470929
PMCID: PMC10962999 (available on 2024-09-12)
DOI: 10.1073/pnas.2310002121

Abstract

We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories, but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.

Keywords: deep learning; information geometry; optimization; principal component analysis; visualization.

Grants and funding

R01 NS116595/NS/NINDS NIH HHS/United States