D2D Mobile Relaying Meets NOMA-Part II: A Reinforcement Learning Perspective

Safaa Driouech; Essaid Sabir; Mounir Ghogho; El-Mehdi Amhoud

doi:10.3390/s21051755

D2D Mobile Relaying Meets NOMA-Part II: A Reinforcement Learning Perspective

Sensors (Basel). 2021 Mar 4;21(5):1755. doi: 10.3390/s21051755.

Authors

Safaa Driouech^{1

2}, Essaid Sabir^{1

3}, Mounir Ghogho⁴, El-Mehdi Amhoud⁵

Affiliations

¹ NEST Research Group, LRI Lab., ENSEM, Hassan II University of Casablanca, Casablanca 20000, Morocco.
² Laoratoire de Reacherche en Informatique, Sorbonne Université, CNRS, LIP6, F-75005 Paris, France.
³ Department of Computer Science, University of Quebec at Montreal, Montreal, QC H2L 2C4, Canada.
⁴ TICLab, International University of Rabat, Rabat 11100, Morocco.
⁵ School of Computer Science, Mohammed VI Polytechnic University, Ben Guerir 43150, Morocco.

Abstract

Structureless communications such as Device-to-Device (D2D) relaying are undeniably of paramount importance to improving the performance of today's mobile networks. Such a communication paradigm requires a certain level of intelligence at the device level, thereby allowing it to interact with the environment and make proper decisions. However, decentralizing decision-making may induce paradoxical outcomes, resulting in a drop in performance, which sustains the design of self-organizing yet efficient systems. We propose that each device decides either to directly connect to the eNodeB or get access via another device through a D2D link. In the first part of this article, we describe a biform game framework to analyze the proposed self-organized system's performance, under pure and mixed strategies. We use two reinforcement learning (RL) algorithms, enabling devices to self-organize and learn their pure/mixed equilibrium strategies in a fully distributed fashion. Decentralized RL algorithms are shown to play an important role in allowing devices to be self-organized and reach satisfactory performance with incomplete information or even under uncertainties. We point out through a simulation the importance of D2D relaying and assess how our learning schemes perform under slow/fast channel fading.

Keywords: 5G/B5G/6G; D2D relaying; NOMA/OMA; Nash equilibrium; biform game; distributed reinforcement learning; self-organized devices.