Hierarchical Attention Master-Slave for heterogeneous multi-agent reinforcement learning

Jiao Wang; Mingrui Yuan; Yun Li; Zihui Zhao

doi:10.1016/j.neunet.2023.02.037

Hierarchical Attention Master-Slave for heterogeneous multi-agent reinforcement learning

Neural Netw. 2023 May:162:359-368. doi: 10.1016/j.neunet.2023.02.037. Epub 2023 Mar 4.

Authors

Jiao Wang¹, Mingrui Yuan², Yun Li³, Zihui Zhao⁴

Affiliations

¹ College of Information Science and Engineering, Northeastern University, No. 3-11, Wenhua Road, Heping District, Shenyang, 110819, Liaoning, PR China. Electronic address: wangjiao@ise.neu.edu.cn.
² College of Information Science and Engineering, Northeastern University, No. 3-11, Wenhua Road, Heping District, Shenyang, 110819, Liaoning, PR China. Electronic address: 15838868117@163.com.
³ College of Information Science and Engineering, Northeastern University, No. 3-11, Wenhua Road, Heping District, Shenyang, 110819, Liaoning, PR China. Electronic address: 18236884385@163.com.
⁴ College of Information Science and Engineering, Northeastern University, No. 3-11, Wenhua Road, Heping District, Shenyang, 110819, Liaoning, PR China.

PMID: 36940496
DOI: 10.1016/j.neunet.2023.02.037

Abstract

Most multi-agent reinforcement learning (MARL) approaches optimize strategy by improving itself, while ignoring the limitations of homogeneous agents that may have single function. However, in reality, the complex tasks tend to coordinate various types of agents and leverage advantages from one another. Therefore, it is a vital research issue how to establish appropriate communication among them and optimize decision. To this end, we propose a Hierarchical Attention Master-Slave (HAMS) MARL, where the Hierarchical Attention balances the weight allocation within and among clusters, and the Master-Slave architecture endows agents independent reasoning and individual guidance. By the offered design, information fusion, especially among clusters, is implemented effectively, and excessive communication is avoided, moreover, selective composed action optimizes decision. We evaluate the HAMS on both small and large scale heterogeneous StarCraft II micromanagement tasks. The proposed algorithm achieves the exceptional performance with more than 80% win rates in all evaluation scenarios, which obtains an impressive win rate of over 90% in the largest map. The experiments demonstrate a maximum improvement in win rate of 47% over the best known algorithm. The results show that our proposal outperforms recent state-of-the-art approaches, which provides a novel idea for heterogeneous multi-agent policy optimization.

Keywords: Communication; Cooperative games; Heterogeneous agents; Multi-agent reinforcement learning; Self-attention.

MeSH terms

Algorithms
Communication
Learning*
Problem Solving
Reinforcement, Psychology*