A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent

Shi Pu; Alex Olshevsky; Ioannis Ch Paschalidis

doi:10.1109/tac.2021.3126253

A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent

IEEE Trans Automat Contr. 2022 Nov;67(11):5900-5915. doi: 10.1109/tac.2021.3126253. Epub 2021 Nov 9.

Authors

Shi Pu¹, Alex Olshevsky², Ioannis Ch Paschalidis²

Affiliations

¹ School of Data Science, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, China.
² Department of Electrical and Computer Engineering and the Division of Systems Engineering, Boston University, Boston, MA.

Abstract

This paper is concerned with minimizing the average of n cost functions over a network in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a non-asymptotic convergence analysis. For strongly convex and smooth objective functions, in expectation, DSGD asymptotically achieves the optimal network independent convergence rate compared to centralized stochastic gradient descent (SGD). Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate. Moreover, we construct a "hard" optimization problem that proves the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.

Keywords: convex optimization; distributed optimization; stochastic gradient descent; stochastic programming.

Grants and funding

R01 GM135930/GM/NIGMS NIH HHS/United States