Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning

Yuanyuan Liu; Fanhua Shang; Hongying Liu; Lin Kong; Licheng Jiao; Zhouchen Lin

doi:10.1109/TPAMI.2020.3000512

Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning

IEEE Trans Pattern Anal Mach Intell. 2021 Dec;43(12):4242-4255. doi: 10.1109/TPAMI.2020.3000512. Epub 2021 Nov 3.

Authors

Yuanyuan Liu, Fanhua Shang, Hongying Liu, Lin Kong, Licheng Jiao, Zhouchen Lin

PMID: 32750780
DOI: 10.1109/TPAMI.2020.3000512

Abstract

Recently, many stochastic variance reduced alternating direction methods of multipliers (ADMMs) (e.g., SAG-ADMM and SVRG-ADMM) have made exciting progress such as linear convergence rate for strongly convex (SC) problems. However, their best-known convergence rate for non-strongly convex (non-SC) problems is O(1/T) as opposed to O(1/T²) of accelerated deterministic algorithms, where T is the number of iterations. Thus, there remains a gap in the convergence rates of existing stochastic ADMM and deterministic algorithms. To bridge this gap, we introduce a new momentum acceleration trick into stochastic variance reduced ADMM, and propose a novel accelerated SVRG-ADMM method (called ASVRG-ADMM) for the machine learning problems with the constraint Ax + By = c. Then we design a linearized proximal update rule and a simple proximal one for the two classes of ADMM-style problems with B = τI and B ≠ τI, respectively, where I is an identity matrix and τ is an arbitrary bounded constant. Note that our linearized proximal update rule can avoid solving sub-problems iteratively. Moreover, we prove that ASVRG-ADMM converges linearly for SC problems. In particular, ASVRG-ADMM improves the convergence rate from O(1/T) to O(1/T²) for non-SC problems. Finally, we apply ASVRG-ADMM to various machine learning problems, e.g., graph-guided fused Lasso, graph-guided logistic regression, graph-guided SVM, generalized graph-guided fused Lasso and multi-task learning, and show that ASVRG-ADMM consistently converges faster than the state-of-the-art methods.