Efficient model learning methods for actor-critic control

Ivo Grondman; Maarten Vaandrager; Lucian Buşoniu; Robert Babuska; Erik Schuitema

doi:10.1109/TSMCB.2011.2170565

Efficient model learning methods for actor-critic control

IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):591-602. doi: 10.1109/TSMCB.2011.2170565. Epub 2011 Dec 7.

Authors

Ivo Grondman¹, Maarten Vaandrager, Lucian Buşoniu, Robert Babuska, Erik Schuitema

Affiliation

¹ Delft Center for Systems and Control, Delft University of Technology, Delft, The Netherlands. i.grondman@tudelft.nl

PMID: 22156998
DOI: 10.1109/TSMCB.2011.2170565

Abstract

We propose two new actor-critic algorithms for reinforcement learning. Both algorithms use local linear regression (LLR) to learn approximations of the functions involved. A crucial feature of the algorithms is that they also learn a process model, and this, in combination with LLR, provides an efficient policy update for faster learning. The first algorithm uses a novel model-based update rule for the actor parameters. The second algorithm does not use an explicit actor but learns a reference model which represents a desired behavior, from which desired control actions can be calculated using the inverse of the learned process model. The two novel methods and a standard actor-critic algorithm are applied to the pendulum swing-up problem, in which the novel methods achieve faster learning than the standard algorithm.

MeSH terms

Algorithms*
Artificial Intelligence*
Computer Simulation
Decision Support Techniques*
Models, Theoretical*
Pattern Recognition, Automated / methods*