Toward reliable designs of data-driven reinforcement learning tracking control for Euler-Lagrange systems

Zhikai Yao; Jianyong Yao

doi:10.1016/j.neunet.2022.05.017

Toward reliable designs of data-driven reinforcement learning tracking control for Euler-Lagrange systems

Neural Netw. 2022 Sep:153:564-575. doi: 10.1016/j.neunet.2022.05.017. Epub 2022 Jun 3.

Authors

Zhikai Yao¹, Jianyong Yao²

Affiliations

¹ College of Automation & College of Artifical Intelligence, Nanjing University of Post and Telecommunication, Nanjing, Jiangsu Province, 210023, China; School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu Province, 210094, China.
² School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu Province, 210094, China. Electronic address: jerryyao.buaa@gmail.com.

PMID: 35843117
DOI: 10.1016/j.neunet.2022.05.017

Abstract

This paper addresses reinforcement learning based, direct signal tracking control with an objective of developing mathematically suitable and practically useful design approaches. Specifically, we aim to provide reliable and easy to implement designs in order to reach reproducible neural network-based solutions. Our proposed new design takes advantage of two control design frameworks: a reinforcement learning based, data-driven approach to provide the needed adaptation and (sub)optimality, and a backstepping based approach to provide closed-loop system stability framework. We develop this work based on an established direct heuristic dynamic programming (dHDP) learning paradigm to perform online learning and adaptation and a backstepping design for a class of important nonlinear dynamics described as Euler-Lagrange systems. We provide a theoretical guarantee for the stability of the overall dynamic system, weight convergence of the approximating nonlinear neural networks, and the Bellman (sub)optimality of the resulted control policy. We use simulations to demonstrate significantly improved design performance of the proposed approach over the original dHDP.

Keywords: Backstepping; Direct heuristic dynamic programming (dHDP); Reinforcement learning; Tracking control.

MeSH terms

Algorithms*
Computer Simulation
Feedback
Neural Networks, Computer
Nonlinear Dynamics*