Continuous-Time Reinforcement Learning: New Design Algorithms With Theoretical Insights and Performance Guarantees

IEEE Trans Neural Netw Learn Syst. 2024 May 10:PP. doi: 10.1109/TNNLS.2024.3392237. Online ahead of print.

Abstract

Continuous-time reinforcement learning (CT-RL) methods hold great promise in real-world applications. Adaptive dynamic programming (ADP)-based CT-RL algorithms, especially their theoretical developments, have achieved great successes. However, these methods have not been demonstrated for solving realistic or meaningful learning control problems. Thus, the goal of this work is to introduce a suite of new excitable integral reinforcement learning (EIRL) algorithms for control of CT affine nonlinear systems. This work develops a new excitation framework to improve persistence of excitation (PE) and numerical performance via input/output insights from classical control. Furthermore, when the system dynamics afford a physically-motivated partition into distinct dynamical loops, the proposed methods break the control problem into smaller subproblems, resulting in reduced complexity. By leveraging the known affine nonlinear dynamics, the methods achieve well-behaved system responses and considerable data efficiency. The work provides convergence, solution optimality, and closed-loop stability guarantees of the proposed methods, and it demonstrates these guarantees on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV).