Gradient Descent Learning With Floats

IEEE Trans Cybern. 2022 Mar;52(3):1763-1771. doi: 10.1109/TCYB.2020.2997399. Epub 2022 Mar 11.

Abstract

The gradient learning descent method is the main workhorse of training tasks in artificial intelligence and machine-learning research. Current theoretical studies of gradient descent only use the continuous domains, which is unreal since electronic computers use the float point numbers to store and deal with data. Although existing results are sufficient for the extremely tiny errors in high-precision machines, they need to be improved for low-precision cases. This article presents an understanding of the learning algorithm in computers with floats. The performances of three gradient descents with the floating domain are investigated when the objective function is smooth. When the function is assumed to have the PŁ condition, the convergence speed can be improved. We proved that for floating gradient descent to obtain an error with ϵ , the iteration is O(1/ϵ) for the general smooth case, and O(ln(1/ϵ)) for the PŁ case. But ϵ should be larger than the s -bit machine epsilon δ(s) in the deterministic case, that is, ϵ ≥ Ω(δ(s)) , while ϵ ≥ Ω(√{δ(s)}) for the stochastic case. Floating stochastic and sign gradient descents can both output an ϵ noised result in O(1/ϵ2) iterations.