Markov chain stochastic DCA and applications in deep learning with PDEs regularization

Neural Netw. 2024 Feb:170:149-166. doi: 10.1016/j.neunet.2023.11.032. Epub 2023 Nov 13.

Abstract

This paper addresses a large class of nonsmooth nonconvex stochastic DC (difference-of-convex functions) programs where endogenous uncertainty is involved and i.i.d. (independent and identically distributed) samples are not available. Instead, we assume that it is only possible to access Markov chains whose sequences of distributions converge to the target distributions. This setting is legitimate as Markovian noise arises in many contexts including Bayesian inference, reinforcement learning, and stochastic optimization in high-dimensional or combinatorial spaces. We then design a stochastic algorithm named Markov chain stochastic DCA (MCSDCA) based on DCA (DC algorithm) - a well-known method for nonconvex optimization. We establish the convergence analysis in both asymptotic and nonasymptotic senses. The MCSDCA is then applied to deep learning via PDEs (partial differential equations) regularization, where two realizations of MCSDCA are constructed, namely MCSDCA-odLD and MCSDCA-udLD, based on overdamped and underdamped Langevin dynamics, respectively. Numerical experiments on time series prediction and image classification problems with a variety of neural network topologies show the merits of the proposed methods.

Keywords: DC programming and DCA; Deep learning; Markov chain Monte Carlo; Partial differential equations.

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Deep Learning*
  • Markov Chains
  • Neural Networks, Computer