Action-driven contrastive representation for reinforcement learning

PLoS One. 2022 Mar 18;17(3):e0265456. doi: 10.1371/journal.pone.0265456. eCollection 2022.

Abstract

In reinforcement learning, reward-driven feature learning directly from high-dimensional images faces two challenges: sample-efficiency for solving control tasks and generalization to unseen observations. In prior works, these issues have been addressed through learning representation from pixel inputs. However, their representation faced the limitations of being vulnerable to the high diversity inherent in environments or not taking the characteristics for solving control tasks. To attenuate these phenomena, we propose the novel contrastive representation method, Action-Driven Auxiliary Task (ADAT), which forces a representation to concentrate on essential features for deciding actions and ignore control-irrelevant details. In the augmented state-action dictionary of ADAT, the agent learns representation to maximize agreement between observations sharing the same actions. The proposed method significantly outperforms model-free and model-based algorithms in the Atari and OpenAI ProcGen, widely used benchmarks for sample-efficiency and generalization.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Learning
  • Reinforcement, Psychology*
  • Reward

Grants and funding

This project was jointly supported by the Swarm AI Learning Framework Development Project of Agency For Defense Development and National Research Foundation of Korea (NRF) grant funded by the Korea government (No.2021R1A2C2008855). The funders provided support in the form of salaries for authors, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.