A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space

Neural Netw. 2023 Jul:164:419-427. doi: 10.1016/j.neunet.2023.04.042. Epub 2023 May 5.

Abstract

Although reinforcement learning (RL) has made numerous breakthroughs in recent years, addressing reward-sparse environments remains challenging and requires further exploration. Many studies improve the performance of the agents by introducing the state-action pairs experienced by an expert. However, such kinds of strategies almost depend on the quality of the demonstration by the expert, which is rarely optimal in a real-world environment, and struggle with learning from sub-optimal demonstrations. In this paper, a self-imitation learning algorithm based on the task space division is proposed to realize an efficient high-quality demonstration acquire while the training process. To determine the quality of the trajectory, some well-designed criteria are defined in the task space for finding a better demonstration. The results show that the proposed algorithm will improve the success rate of robot control and achieve a high mean Q value per step. The algorithm framework proposed in this paper has illustrated a great potential to learn from a demonstration generated by using self-policy in sparse environments and can be used in reward-sparse environments where the task space can be divided.

Keywords: Reinforcement learning; Robotic grasping; Self-imitation learning; Sparse reward function; Task space division.

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Reinforcement, Psychology
  • Reward