Multiexperience-Assisted Efficient Multiagent Reinforcement Learning

IEEE Trans Neural Netw Learn Syst. 2023 Apr 10:PP. doi: 10.1109/TNNLS.2023.3264275. Online ahead of print.

Abstract

Recently, multiagent reinforcement learning (MARL) has shown great potential for learning cooperative policies in multiagent systems (MASs). However, a noticeable drawback of current MARL is the low sample efficiency, which causes a huge amount of interactions with environment. Such amount of interactions greatly hinders the real-world application of MARL. Fortunately, effectively incorporating experience knowledge can assist MARL to quickly find effective solutions, which can significantly alleviate the drawback. In this article, a novel multiexperience-assisted reinforcement learning (MEARL) method is proposed to improve the learning efficiency of MASs. Specifically, monotonicity-constrained reward shaping is innovatively designed using expert experience to provide additional individual rewards to guide multiagent learning efficiently, with the invariance guarantee of the team optimization objective. Furthermore, a reward distribution estimator is specially developed to model an implicated reward distribution of environment by using transition experience from environment, containing collected samples (state-action pair, reward, and next state). This estimator can predict the expectation reward of each agent for the taken action to accurately estimate the state value function and accelerate its convergence. Besides, the performance of MEARL is evaluated on two multiagent environment platforms: our designed unmanned aerial vehicle combat (UAV-C) and StarCraft II Micromanagement (SCII-M). Simulation results demonstrate that the proposed MEARL can greatly improve the learning efficiency and performance of MASs and is superior to the state-of-the-art methods in multiagent tasks.