Taming Lagrangian chaos with multi-objective reinforcement learning

Chiara Calascibetta; Luca Biferale; Francesco Borra; Antonio Celani; Massimo Cencini

doi:10.1140/epje/s10189-023-00271-0

Taming Lagrangian chaos with multi-objective reinforcement learning

Eur Phys J E Soft Matter. 2023 Mar 3;46(3):9. doi: 10.1140/epje/s10189-023-00271-0.

Authors

Chiara Calascibetta¹, Luca Biferale², Francesco Borra³, Antonio Celani⁴, Massimo Cencini^{5

6}

Affiliations

¹ Department of Physics & INFN, University of Rome 'Tor Vergata', Via della Ricerca Scientifica 1, 00133, Rome, Italy. calascibetta@roma2.infn.it.
² Department of Physics & INFN, University of Rome 'Tor Vergata', Via della Ricerca Scientifica 1, 00133, Rome, Italy.
³ Laboratory of Physics of the École Normale Supérieure, 24 RueLhomond, 75005, Paris, France.
⁴ Quantitative Life Sciences, The Abdus Salam International Centre for Theoretical Physics, ICTP, 34151, Trieste, Italy.
⁵ Istituto dei Sistemi Complessi, CNR, Via dei Taurini 19, 00185, Rome, Italy.
⁶ INFN 'Tor Vergata', Rome, Italy.

PMID: 36867296
DOI: 10.1140/epje/s10189-023-00271-0

Abstract

We consider the problem of two active particles in 2D complex flows with the multi-objective goals of minimizing both the dispersion rate and the control activation cost of the pair. We approach the problem by means of multi-objective reinforcement learning (MORL), combining scalarization techniques together with a Q-learning algorithm, for Lagrangian drifters that have variable swimming velocity. We show that MORL is able to find a set of trade-off solutions forming an optimal Pareto frontier. As a benchmark, we show that a set of heuristic strategies are dominated by the MORL solutions. We consider the situation in which the agents cannot update their control variables continuously, but only after a discrete (decision) time, [Formula: see text]. We show that there is a range of decision times, in between the Lyapunov time and the continuous updating limit, where reinforcement learning finds strategies that significantly improve over heuristics. In particular, we discuss how large decision times require enhanced knowledge of the flow, whereas for smaller [Formula: see text] all a priori heuristic strategies become Pareto optimal.

Grants and funding

882340/HORIZON EUROPE European Research Council ()