Multisource Transfer Double DQN Based on Actor Learning

Jie Pan; Xuesong Wang; Yuhu Cheng; Qiang Yu; Jie Pan; Xuesong Wang; Yuhu Cheng; Qiang Yu; Qiang Yu; Yuhu Cheng; Jie Pan; Xuesong Wang

doi:10.1109/TNNLS.2018.2806087

Multisource Transfer Double DQN Based on Actor Learning

IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2227-2238. doi: 10.1109/TNNLS.2018.2806087.

Authors

Jie Pan, Xuesong Wang, Yuhu Cheng, Qiang Yu, Jie Pan, Xuesong Wang, Yuhu Cheng, Qiang Yu, Qiang Yu, Yuhu Cheng, Jie Pan, Xuesong Wang

PMID: 29771674
DOI: 10.1109/TNNLS.2018.2806087

Abstract

Deep reinforcement learning (RL) comprehensively uses the psychological mechanisms of "trial and error" and "reward and punishment" in RL as well as powerful feature expression and nonlinear mapping in deep learning. Currently, it plays an essential role in the fields of artificial intelligence and machine learning. Since an RL agent needs to constantly interact with its surroundings, the deep Q network (DQN) is inevitably faced with the need to learn numerous network parameters, which results in low learning efficiency. In this paper, a multisource transfer double DQN (MTDDQN) based on actor learning is proposed. The transfer learning technique is integrated with deep RL to make the RL agent collect, summarize, and transfer action knowledge, including policy mimic and feature regression, to the training of related tasks. There exists action overestimation in DQN, i.e., the lower probability limit of action corresponding to the maximum Q value is nonzero. Therefore, the transfer network is trained by using double DQN to eliminate the error accumulation caused by action overestimation. In addition, to avoid negative transfer, i.e., to ensure strong correlations between source and target tasks, a multisource transfer learning mechanism is applied. The Atari2600 game is tested on the arcade learning environment platform to evaluate the feasibility and performance of MTDDQN by comparing it with some mainstream approaches, such as DQN and double DQN. Experiments prove that MTDDQN achieves not only human-like actor learning transfer capability, but also the desired learning efficiency and testing accuracy on target task.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computer Simulation
Humans
Machine Learning*
Neural Networks, Computer*
Reinforcement, Psychology*
Transfer, Psychology*