PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function

Jie Chen; Tao Wu; Meiping Shi; Wei Jiang

doi:10.3390/s20195626

PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function

Sensors (Basel). 2020 Oct 1;20(19):5626. doi: 10.3390/s20195626.

Authors

Jie Chen¹, Tao Wu¹, Meiping Shi¹, Wei Jiang¹

Affiliation

¹ The College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China.

Abstract

Autonomous driving with artificial intelligence technology has been viewed as promising for autonomous vehicles hitting the road in the near future. In recent years, considerable progress has been made with Deep Reinforcement Learnings (DRLs) for realizing end-to-end autonomous driving. Still, driving safely and comfortably in real dynamic scenarios with DRL is nontrivial due to the reward functions being typically pre-defined with expertise. This paper proposes a human-in-the-loop DRL algorithm for learning personalized autonomous driving behavior in a progressive learning way. Specifically, a progressively optimized reward function (PORF) learning model is built and integrated into the Deep Deterministic Policy Gradient (DDPG) framework, which is called PORF-DDPG in this paper. PORF consists of two parts: the first part of the PORF is a pre-defined typical reward function on the system state, the second part is modeled as a Deep Neural Network (DNN) for representing driving adjusting intention by the human observer, which is the main contribution of this paper. The DNN-based reward model is progressively learned using the front-view images as the input and via active human supervision and intervention. The proposed approach is potentially useful for driving in dynamic constrained scenarios when dangerous collision events might occur frequently with classic DRLs. The experimental results show that the proposed autonomous driving behavior learning method exhibits online learning capability and environmental adaptability.

Keywords: autonomous driving; deep reinforcement learning; progressive optimization; reward function; sequential frames.

MeSH terms

Artificial Intelligence*
Automobile Driving*
Humans
Neural Networks, Computer*
Reward

Grants and funding

61973311/NSFC Grants