Accelerating Minibatch Stochastic Gradient Descent Using Typicality Sampling

IEEE Trans Neural Netw Learn Syst. 2020 Nov;31(11):4649-4659. doi: 10.1109/TNNLS.2019.2957003. Epub 2020 Oct 29.

Abstract

Machine learning, especially deep neural networks, has developed rapidly in fields, including computer vision, speech recognition, and reinforcement learning. Although minibatch stochastic gradient descent (SGD) is one of the most popular stochastic optimization methods for training deep networks, it shows a slow convergence rate due to the large noise in the gradient approximation. In this article, we attempt to remedy this problem by building a more efficient batch selection method based on typicality sampling, which reduces the error of gradient estimation in conventional minibatch SGD. We analyze the convergence rate of the resulting typical batch SGD algorithm and compare the convergence properties between the minibatch SGD and the algorithm. Experimental results demonstrate that our batch selection scheme works well and more complex minibatch SGD variants can benefit from the proposed batch selection strategy.

Publication types

  • Research Support, Non-U.S. Gov't