A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization

IEEE Trans Pattern Anal Mach Intell. 2021 Jun 8:PP. doi: 10.1109/TPAMI.2021.3087328. Online ahead of print.

Abstract

Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient~(HSDMPG) algorithm for strongly convex problems with linear prediction structure, e.g.~least squares and logistic/softmax regression. HSDMPG~enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving~minibatch of individual losses to estimate the original problem, and efficiently minimizes the sampled smaller-sized subproblems. For strongly convex loss of n components, HSDMPG~attains an ϵ-optimization-error within [Formula: see text] stochastic gradient evaluations, where κ is condition number, ζ = 1 for quadratic loss and ζ = 2 for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when ϵ = O(1/√n) which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively O (n0.5log2(n)) and O (n0.5log3(n)), which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG~to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of~HSDMPG.