Markov chain stochastic DCA and applications in deep learning with PDEs regularization

Hoang Phuc Hau Luu; Hoai Minh Le; Hoai An Le Thi

doi:10.1016/j.neunet.2023.11.032

Markov chain stochastic DCA and applications in deep learning with PDEs regularization

Neural Netw. 2024 Feb:170:149-166. doi: 10.1016/j.neunet.2023.11.032. Epub 2023 Nov 13.

Authors

Hoang Phuc Hau Luu¹, Hoai Minh Le², Hoai An Le Thi³

Affiliations

¹ Université de Lorraine, LGIPM, Metz, 57000, France. Electronic address: hoang-phuc-hau.luu@univ-lorraine.fr.
² Université de Lorraine, LGIPM, Metz, 57000, France. Electronic address: minh.le@univ-lorraine.fr.
³ Université de Lorraine, LGIPM, Metz, 57000, France; Institut Universitaire de France (IUF), Paris, France. Electronic address: hoai-an.le-thi@univ-lorraine.fr.

PMID: 37984042
DOI: 10.1016/j.neunet.2023.11.032

Abstract

This paper addresses a large class of nonsmooth nonconvex stochastic DC (difference-of-convex functions) programs where endogenous uncertainty is involved and i.i.d. (independent and identically distributed) samples are not available. Instead, we assume that it is only possible to access Markov chains whose sequences of distributions converge to the target distributions. This setting is legitimate as Markovian noise arises in many contexts including Bayesian inference, reinforcement learning, and stochastic optimization in high-dimensional or combinatorial spaces. We then design a stochastic algorithm named Markov chain stochastic DCA (MCSDCA) based on DCA (DC algorithm) - a well-known method for nonconvex optimization. We establish the convergence analysis in both asymptotic and nonasymptotic senses. The MCSDCA is then applied to deep learning via PDEs (partial differential equations) regularization, where two realizations of MCSDCA are constructed, namely MCSDCA-odLD and MCSDCA-udLD, based on overdamped and underdamped Langevin dynamics, respectively. Numerical experiments on time series prediction and image classification problems with a variety of neural network topologies show the merits of the proposed methods.

Keywords: DC programming and DCA; Deep learning; Markov chain Monte Carlo; Partial differential equations.

MeSH terms

Algorithms
Bayes Theorem
Deep Learning*
Markov Chains
Neural Networks, Computer