Scalable-MADDPG-Based Cooperative Target Invasion for a Multi-USV System

IEEE Trans Neural Netw Learn Syst. 2023 Sep 7:PP. doi: 10.1109/TNNLS.2023.3309689. Online ahead of print.

Abstract

This article concentrates on proposing a scalable deep reinforcement learning (DRL) method for a multiple unmanned surface vehicle (multi-USV) system to operate cooperative target invasion. The multi-USV system, which is made up of multiple invaders, needs to invade target areas in a specified time. A novel scalable reinforcement learning (RL) method called Scalable-MADDPG is proposed for the first time. In this method, the scale of the multi-USV system can be changed at any time without interrupting the training process. Then, to mitigate the policy oscillation after applying Scalable-MADDPG, a bi-directional long-short-term memory (Bi-LSTM) network is constructed. Moreover, an improved ϵ -greedy strategy is proposed to help balance the exploration and exploitation in RL. Furthermore, to enhance the robustness of the optimal policy, Ornstein-Uhlenbeck (OU) noise is added in this improved ϵ -greedy strategy during the training process. Finally, the scalable RL method is used to help the multi-USV system perform cooperative target invasion under complex marine environments. The effectiveness of Scalable-MADDPG is demonstrated through three experiments.