Prioritizing Useful Experience Replay for Heuristic Dynamic Programming-Based Learning Systems

IEEE Trans Cybern. 2019 Nov;49(11):3911-3922. doi: 10.1109/TCYB.2018.2853582. Epub 2018 Jul 30.

Abstract

The adaptive dynamic programming controller usually needs a long training period because the data usage efficiency is relatively low by discarding the samples once used. Prioritized experience replay (ER) promotes important experiences and is more efficient in learning the control process. This paper proposes integrating an efficient learning capability of prioritized ER design into heuristic dynamic programming (HDP). First, a one time-step backward state-action pair is used to design the ER tuple and, thus, avoids the model network. Second, a systematic approach is proposed to integrate the ER into both critic and action networks of HDP controller design. The proposed approach is tested for two case studies: a cart-pole balancing task and a triple-link pendulum balancing task. For fair comparison, we set the same initial weight parameters and initial starting states for both traditional HDP and the proposed approach under the same simulation environment. The proposed approach improves the required average number of trials to succeed by 60.56% for cart-pole, and 56.89% for triple-link balancing tasks, in comparison with the traditional HDP approach. Also, we have added results of ER-based HDP for comparison. Moreover, theoretical convergence analysis is presented to guarantee the stability of the proposed control design.