Distributional generative adversarial imitation learning with reproducing kernel generalization

Yirui Zhou; Mengxiao Lu; Xiaowei Liu; Zhengping Che; Zhiyuan Xu; Jian Tang; Yangchun Zhang; Yan Peng; Yaxin Peng

doi:10.1016/j.neunet.2023.05.027

Distributional generative adversarial imitation learning with reproducing kernel generalization

Neural Netw. 2023 Aug:165:43-59. doi: 10.1016/j.neunet.2023.05.027. Epub 2023 May 25.

Authors

Yirui Zhou¹, Mengxiao Lu², Xiaowei Liu³, Zhengping Che⁴, Zhiyuan Xu⁵, Jian Tang⁶, Yangchun Zhang⁷, Yan Peng⁸, Yaxin Peng⁹

Affiliations

¹ Department of Mathematics, College of Sciences, Shanghai University, Shanghai, 200444, China. Electronic address: zyr050798@shu.edu.cn.
² Department of Mathematics, College of Sciences, Shanghai University, Shanghai, 200444, China. Electronic address: lumx0930@shu.edu.cn.
³ Department of Mathematics, College of Sciences, Shanghai University, Shanghai, 200444, China. Electronic address: davidlau@shu.edu.cn.
⁴ Midea Group, Shanghai, 201702, China. Electronic address: chezp@midea.com.
⁵ Midea Group, Shanghai, 201702, China. Electronic address: xuzy70@midea.com.
⁶ Midea Group, Shanghai, 201702, China. Electronic address: tangjian22@midea.com.
⁷ Department of Mathematics, College of Sciences, Shanghai University, Shanghai, 200444, China. Electronic address: zycstatis@shu.edu.cn.
⁸ School of Artificial Intelligence, Shanghai University, Shanghai, 200444, China. Electronic address: pengyan@shu.edu.cn.
⁹ Department of Mathematics, College of Sciences, Shanghai University, Shanghai, 200444, China. Electronic address: yaxin.peng@shu.edu.cn.

PMID: 37276810
DOI: 10.1016/j.neunet.2023.05.027

Abstract

Generative adversarial imitation learning (GAIL) regards imitation learning (IL) as a distribution matching problem between the state-action distributions of the expert policy and the learned policy. In this paper, we focus on the generalization and computational properties of policy classes. We prove that the generalization can be guaranteed in GAIL when the class of policies is well controlled. With the capability of policy generalization, we introduce distributional reinforcement learning (RL) into GAIL and propose the greedy distributional soft gradient (GDSG) algorithm to solve GAIL. The main advantages of GDSG can be summarized as: (1) Q-value overestimation, a crucial factor leading to the instability of GAIL with off-policy training, can be alleviated by distributional RL. (2) By considering the maximum entropy objective, the policy can be improved in terms of performance and sample efficiency through sufficient exploration. Moreover, GDSG attains a sublinear convergence rate to a stationary solution. Comprehensive experimental verification in MuJoCo environments shows that GDSG can mimic expert demonstrations better than previous GAIL variants.

Keywords: Computational properties; Distributional reinforcement learning; Generative adversarial imitation learning; Policy generalization.

MeSH terms

Algorithms
Generalization, Psychological
Imitative Behavior*
Learning*
Reinforcement, Psychology