Dynamical systems as a level of cognitive analysis of multi-agent learning: Algorithmic foundations of temporal-difference learning dynamics

Wolfram Barfuss

doi:10.1007/s00521-021-06117-0

Dynamical systems as a level of cognitive analysis of multi-agent learning: Algorithmic foundations of temporal-difference learning dynamics

Neural Comput Appl. 2022;34(3):1653-1671. doi: 10.1007/s00521-021-06117-0. Epub 2021 Jun 23.

Author

Wolfram Barfuss^{1

2}

Affiliations

¹ School of Mathematics, University of Leeds, Leeds, UK.
² Tübingen AI Center, University of Tübingen, Tübingen, Germany.

Abstract

A dynamical systems perspective on multi-agent learning, based on the link between evolutionary game theory and reinforcement learning, provides an improved, qualitative understanding of the emerging collective learning dynamics. However, confusion exists with respect to how this dynamical systems account of multi-agent learning should be interpreted. In this article, I propose to embed the dynamical systems description of multi-agent learning into different abstraction levels of cognitive analysis. The purpose of this work is to make the connections between these levels explicit in order to gain improved insight into multi-agent learning. I demonstrate the usefulness of this framework with the general and widespread class of temporal-difference reinforcement learning. I find that its deterministic dynamical systems description follows a minimum free-energy principle and unifies a boundedly rational account of game theory with decision-making under uncertainty. I then propose an on-line sample-batch temporal-difference algorithm which is characterized by the combination of applying a memory-batch and separated state-action value estimation. I find that this algorithm serves as a micro-foundation of the deterministic learning equations by showing that its learning trajectories approach the ones of the deterministic learning equations under large batch sizes. Ultimately, this framework of embedding a dynamical systems description into different abstraction levels gives guidance on how to unleash the full potential of the dynamical systems approach to multi-agent learning.

Keywords: Evolutionary game theory; Levels of analysis; Multi-agent learning; Temporal-difference reinforcement learning.

Grants and funding

MR/S032525/1/MRC_/Medical Research Council/United Kingdom