Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices

Junyun Zhao; Siyuan Huang; Osama Yousuf; Yutong Gao; Brian D Hoskins; Gina C Adam

doi:10.3389/fnins.2021.749811

Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices

Front Neurosci. 2021 Nov 22:15:749811. doi: 10.3389/fnins.2021.749811. eCollection 2021.

Authors

Junyun Zhao¹, Siyuan Huang¹, Osama Yousuf², Yutong Gao¹, Brian D Hoskins³, Gina C Adam²

Affiliations

¹ Department of Computer Science, George Washington University, Washington, DC, United States.
² Department of Electrical and Computer Engineering, George Washington University, Washington, DC, United States.
³ Physical Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, United States.

Abstract

While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.

Keywords: ReRAM; gradient data decomposition; memristor; non-idealities; non-negative matrix factorization; principal component analysis.