Implicit incremental natural actor critic algorithm

Neural Netw. 2019 Jan:109:103-112. doi: 10.1016/j.neunet.2018.10.007. Epub 2018 Oct 21.

Abstract

Natural policy gradient (NPG) methods are promising approaches to finding locally optimal policy parameters. The NPG approach works well in optimizing complex policies with high-dimensional parameters, and the effectiveness of NPG methods has been demonstrated in many fields. However, the incremental estimation of the NPG is computationally unstable owing to its high sensitivity to the step-sizes values, especially to the one used to update the estimate of NPG. In this study, we propose a new incremental and stable algorithm for the NPG estimation. We call the proposed algorithm the implicit incremental natural actor critic (I2NAC), and it is based on the idea of the implicit update. The convergence analysis for I2NAC is provided. Theoretical analysis results indicate the stability of I2NAC and the instability of conventional incremental NPG methods. Numerical experiments were performed, and the results show that I2NAC is less sensitive to the values of the meta-parameters, including the step-size for the NPG update, compared to the existing incremental NPG method.

Keywords: Implicit update; Incremental learning; Natural actor critic; Natural policy gradient; Reinforcement learning.

MeSH terms

  • Algorithms*
  • Humans
  • Models, Theoretical*