Short-sighted deep learning

Ellen de Mello Koch; Anita de Mello Koch; Nicholas Kastanos; Ling Cheng

doi:10.1103/PhysRevE.102.013307

Short-sighted deep learning

Phys Rev E. 2020 Jul;102(1-1):013307. doi: 10.1103/PhysRevE.102.013307.

Authors

Ellen de Mello Koch¹, Anita de Mello Koch¹, Nicholas Kastanos¹, Ling Cheng¹

Affiliation

¹ School of Electrical and Information Engineering, University of the Witwatersrand, Wits 2050, South Africa.

PMID: 32795065
DOI: 10.1103/PhysRevE.102.013307

Abstract

A theory explaining how deep learning works is yet to be developed. Previous work suggests that deep learning performs a coarse graining, similar in spirit to the renormalization group (RG). This idea has been explored in the setting of a local (nearest-neighbor interactions) Ising spin lattice. We extend the discussion to the setting of a long-range spin lattice. Markov-chain Monte Carlo (MCMC) simulations determine both the critical temperature and scaling dimensions of the system. The model is used to train both a single restricted Boltzmann machine (RBM) network, as well as a stacked RBM network. Following earlier Ising model studies, the trained weights of a single-layer RBM network define a flow of lattice models. In contrast to results for nearest-neighbor Ising, the RBM flow for the long-ranged model does not converge to the correct values for the spin and energy scaling dimension. Further, correlation functions between visible and hidden nodes exhibit key differences between the stacked RBM and RG flows. The stacked RBM flow appears to move toward low temperatures, whereas the RG flow moves toward high temperature. This again differs from results obtained for nearest-neighbor Ising.