Multi-agent Continuous Control with Generative Flow Networks

Shuang Luo; Yinchuan Li; Shunyu Liu; Xu Zhang; Yunfeng Shao; Chao Wu

doi:10.1016/j.neunet.2024.106243

Multi-agent Continuous Control with Generative Flow Networks

Neural Netw. 2024 Jun:174:106243. doi: 10.1016/j.neunet.2024.106243. Epub 2024 Mar 20.

Authors

Shuang Luo¹, Yinchuan Li², Shunyu Liu³, Xu Zhang⁴, Yunfeng Shao⁵, Chao Wu⁶

Affiliations

¹ School of Public Affairs, Zhejiang University, Hangzhou 310027, China. Electronic address: luoshuang@zju.edu.cn.
² Huawei Noah's Ark Lab, Beijing 100085, China. Electronic address: liyinchuan@huawei.com.
³ College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China. Electronic address: liushunyu@zju.edu.cn.
⁴ School of Artificial Intelligence, Xidian University, Xi'an 710126, China. Electronic address: zhang.xu@xidian.edu.cn.
⁵ Huawei Noah's Ark Lab, Beijing 100085, China. Electronic address: shaoyunfeng@huawei.com.
⁶ School of Public Affairs, Zhejiang University, Hangzhou 310027, China. Electronic address: chao.wu@zju.edu.cn.

PMID: 38531123
DOI: 10.1016/j.neunet.2024.106243

Abstract

Generative Flow Networks (GFlowNets) aim to generate diverse trajectories from a distribution in which the final states of the trajectories are proportional to the reward, serving as a powerful alternative to reinforcement learning for exploratory control tasks. However, the individual-flow matching constraint in GFlowNets limits their applications for multi-agent systems, especially continuous joint-control problems. In this paper, we propose a novel Multi-Agent generative Continuous Flow Networks (MACFN) method to enable multiple agents to perform cooperative exploration for various compositional continuous objects. Technically, MACFN trains decentralized individual-flow-based policies in a centralized global-flow-based matching fashion. During centralized training, MACFN introduces a continuous flow decomposition network to deduce the flow contributions of each agent in the presence of only global rewards. Then agents can deliver actions solely based on their assigned local flow in a decentralized way, forming a joint policy distribution proportional to the rewards. To guarantee the expressiveness of continuous flow decomposition, we theoretically derive a consistency condition on the decomposition network. Experimental results demonstrate that the proposed method yields results superior to the state-of-the-art counterparts and better exploration capability. Our code is available at https://github.com/isluoshuang/MACFN.

Keywords: Continuous Control; Generative Flow Networks; Multi-agent System.

MeSH terms

Learning*
Policy*
Reinforcement, Psychology
Reward