Strangeness-driven exploration in multi-agent reinforcement learning

Ju-Bong Kim; Ho-Bin Choi; Youn-Hee Han

doi:10.1016/j.neunet.2024.106149

Strangeness-driven exploration in multi-agent reinforcement learning

Neural Netw. 2024 Apr:172:106149. doi: 10.1016/j.neunet.2024.106149. Epub 2024 Jan 26.

Authors

Ju-Bong Kim¹, Ho-Bin Choi², Youn-Hee Han³

Affiliations

¹ Future Convergence Engineering, Department of Computer Science and Engineering, Korea University of Technology and Education, Cheonan, 31253, Republic of Korea. Electronic address: rlawnqhd@koreatech.ac.kr.
² Future Convergence Engineering, Department of Computer Science and Engineering, Korea University of Technology and Education, Cheonan, 31253, Republic of Korea. Electronic address: chb3350@koreatech.ac.kr.
³ Future Convergence Engineering, Department of Computer Science and Engineering, Korea University of Technology and Education, Cheonan, 31253, Republic of Korea. Electronic address: yhhan@koreatech.ac.kr.

PMID: 38306786
DOI: 10.1016/j.neunet.2024.106149

Abstract

In this study, a novel exploration method for centralized training and decentralized execution (CTDE)-based multi-agent reinforcement learning (MARL) is introduced. The method uses the concept of strangeness, which is determined by evaluating (1) the level of the unfamiliarity of the observations an agent encounters and (2) the level of the unfamiliarity of the entire state the agents visit. An exploration bonus, which is derived from the concept of strangeness, is combined with the extrinsic reward obtained from the environment to form a mixed reward, which is then used for training CTDE-based MARL algorithms. Additionally, a separate action-value function is also proposed to prevent the high exploration bonus from overwhelming the sensitivity to extrinsic rewards during MARL training. This separate function is used to design the behavioral policy for generating transitions. The proposed method is not much affected by stochastic transitions commonly observed in MARL tasks and improves the stability of CTDE-based MARL algorithms when used with an exploration method. By providing didactic examples and demonstrating the substantial performance improvement of our proposed exploration method in CTDE-based MARL algorithms, we illustrate the advantages of our approach. These evaluations highlight how our method outperforms state-of-the-art MARL baselines on challenging tasks within the StarCraft II micromanagement benchmark, underscoring its effectiveness in improving MARL.

Keywords: Curiosity; Exploration; Multi-agent reinforcement learning; Strangeness.

MeSH terms

Algorithms
Benchmarking
Learning*
Reinforcement, Psychology*
Reward