We consider the situation in which cooperating agents learn to achieve a common goal based solely on a global return that results from all agents' behavior. The method proposed is based on taking into account the agents' activity, which can be any additional information to help solving multi-agent decentralized learning problems. We propose a gradient ascent algorithm and assess its performance on synthetic data.
© 2023. Springer Nature Limited.