Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks

Dongqi Han; Kenji Doya; Jun Tani

doi:10.1016/j.neunet.2020.06.002

Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks

Neural Netw. 2020 Sep:129:149-162. doi: 10.1016/j.neunet.2020.06.002. Epub 2020 Jun 6.

Authors

Dongqi Han¹, Kenji Doya², Jun Tani³

Affiliations

¹ Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology, Okinawa, Japan.
² Neural Computation Unit, Okinawa Institute of Science and Technology, Okinawa, Japan.
³ Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology, Okinawa, Japan. Electronic address: jun.tani@oist.jp.

PMID: 32534378
DOI: 10.1016/j.neunet.2020.06.002

Abstract

Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics.

Keywords: Compositionality; Multiple timescale; Partially observable Markov decision process; Recurrent neural network; Reinforcement learning.

MeSH terms

Machine Learning / standards*