Decoupled neural network training with re-computation and weight prediction

Jiawei Peng; Yicheng Xu; Zhiping Lin; Zhenyu Weng; Zishuo Yang; Huiping Zhuang

doi:10.1371/journal.pone.0276427

Decoupled neural network training with re-computation and weight prediction

PLoS One. 2023 Feb 23;18(2):e0276427. doi: 10.1371/journal.pone.0276427. eCollection 2023.

Authors

Jiawei Peng¹, Yicheng Xu¹, Zhiping Lin¹, Zhenyu Weng¹, Zishuo Yang¹, Huiping Zhuang²

Affiliations

¹ School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore.
² Shien-Ming Wu School of Intelligent Engineering, South China University of Technology, Guangzhou, China.

Abstract

To break the three lockings during backpropagation (BP) process for neural network training, multiple decoupled learning methods have been investigated recently. These methods either lead to significant drop in accuracy performance or suffer from dramatic increase in memory usage. In this paper, a new form of decoupled learning, named decoupled neural network training scheme with re-computation and weight prediction (DTRP) is proposed. In DTRP, a re-computation scheme is adopted to solve the memory explosion problem, and a weight prediction scheme is proposed to deal with the weight delay caused by re-computation. Additionally, a batch compensation scheme is developed, allowing the proposed DTRP to run faster. Theoretical analysis shows that DTRP is guaranteed to converge to crical points under certain conditions. Experiments are conducted by training various convolutional neural networks on several classification datasets, showing comparable or better results than the state-of-the-art methods and BP. These experiments also reveal that adopting the proposed method, the memory explosion problem is effectively solved, and a significant acceleration is achieved.

Copyright: © 2023 Peng et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Humans
Learning*
Memory Disorders
Neural Networks, Computer*

Grants and funding

We wish to acknowledge the funding support for this project from Nanyang Technological University under the URECA Undergraduate Research Programme. This work was also supported in part by the Science and Engineering Research Council, Agency of Science, Technology and Research, Singapore, through the National Robotics Program under Grant 1922500054.