Multi-Objective Optimization in Air-to-Air Communication System Based on Multi-Agent Deep Reinforcement Learning

Shaofu Lin; Yingying Chen; Shuopeng Li

doi:10.3390/s23239541

Multi-Objective Optimization in Air-to-Air Communication System Based on Multi-Agent Deep Reinforcement Learning

Sensors (Basel). 2023 Nov 30;23(23):9541. doi: 10.3390/s23239541.

Authors

Shaofu Lin¹, Yingying Chen¹, Shuopeng Li¹

Affiliation

¹ Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China.

Abstract

With the advantages of real-time data processing and flexible deployment, unmanned aerial vehicle (UAV)-assisted mobile edge computing systems are widely used in both civil and military fields. However, due to limited energy, it is usually difficult for UAVs to stay in the air for long periods and to perform computational tasks. In this paper, we propose a full-duplex air-to-air communication system (A2ACS) model combining mobile edge computing and wireless power transfer technologies, aiming to effectively reduce the computational latency and energy consumption of UAVs, while ensuring that the UAVs do not interrupt the mission or leave the work area due to insufficient energy. In this system, UAVs collect energy from external air-edge energy servers (AEESs) to power onboard batteries and offload computational tasks to AEESs to reduce latency. To optimize the system's performance and balance the four objectives, including the system throughput, the number of low-power alarms of UAVs, the total energy received by UAVs and the energy consumption of AEESs, we develop a multi-objective optimization framework. Considering that AEESs require rapid decision-making in a dynamic environment, an algorithm based on multi-agent deep deterministic policy gradient (MADDPG) is proposed, to optimize the AEESs' service location and to control the power of energy transfer. While training, the agents learn the optimal policy given the optimization weight conditions. Furthermore, we adopt the K-means algorithm to determine the association between AEESs and UAVs to ensure fairness. Simulated experiment results show that the proposed MODDPG (multi-objective DDPG) algorithm has better performance than the baseline algorithms, such as the genetic algorithm and other deep reinforcement learning algorithms.

Keywords: mobile edge computing (MEC); multi-agent deep reinforcement learning (MADRL); multi-objective optimization (MOO); unmanned aerial vehicle (UAV); wireless power transfer (WPT).

Grants and funding

This research received no external funding.