Context transfer in reinforcement learning using action-value functions

Amin Mousavi; Babak Nadjar Araabi; Majid Nili Ahmadabadi

doi:10.1155/2014/428567

Context transfer in reinforcement learning using action-value functions

Comput Intell Neurosci. 2014:2014:428567. doi: 10.1155/2014/428567. Epub 2014 Dec 31.

Authors

Amin Mousavi¹, Babak Nadjar Araabi², Majid Nili Ahmadabadi²

Affiliations

¹ Cognitive Robotics Lab, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, P.O. Box 14395-515, Tehran, Iran.
² Cognitive Robotics Lab, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, P.O. Box 14395-515, Tehran, Iran ; School of Cognitive Science, Institute for Research in Fundamental Sciences (IPM), P.O. Box 19395-5746, Tehran, Iran.

Abstract

This paper discusses the notion of context transfer in reinforcement learning tasks. Context transfer, as defined in this paper, implies knowledge transfer between source and target tasks that share the same environment dynamics and reward function but have different states or action spaces. In other words, the agents learn the same task while using different sensors and actuators. This requires the existence of an underlying common Markov decision process (MDP) to which all the agents' MDPs can be mapped. This is formulated in terms of the notion of MDP homomorphism. The learning framework is Q-learning. To transfer the knowledge between these tasks, the feature space is used as a translator and is expressed as a partial mapping between the state-action spaces of different tasks. The Q-values learned during the learning process of the source tasks are mapped to the sets of Q-values for the target task. These transferred Q-values are merged together and used to initialize the learning process of the target task. An interval-based approach is used to represent and merge the knowledge of the source tasks. Empirical results show that the transferred initialization can be beneficial to the learning process of the target task.

MeSH terms

Artificial Intelligence*
Choice Behavior
Environment
Humans
Knowledge of Results, Psychological
Markov Chains
Models, Psychological*
Reinforcement, Psychology*
Transfer, Psychology*