Neuro-Evolutionary Direct Policy Search for Multiobjective Optimal Control

IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5926-5938. doi: 10.1109/TNNLS.2021.3071960. Epub 2022 Oct 5.

Abstract

Direct policy search (DPS) is emerging as one of the most effective and widely applied reinforcement learning (RL) methods to design optimal control policies for multiobjective Markov decision processes (MOMDPs). Traditionally, DPS defines the control policy within a preselected functional class and searches its optimal parameterization with respect to a given set of objectives. The functional class should be tailored to the problem at hand and its selection is crucial, as it determines the search space within which solutions can be found. In MOMDPs problems, a different objective tradeoff determines a different fitness landscape, requiring a tradeoff-dynamic functional class selection. Yet, in state-of-the-art applications, the policy class is generally selected a priori and kept constant across the multidimensional objective space. In this work, we present a novel policy search routine called neuro-evolutionary multiobjective DPS (NEMODPS), which extends the DPS problem formulation to conjunctively search the policy functional class and its parameterization in a hyperspace containing policy architectures and coefficients. NEMODPS begins with a population of minimally structured approximating networks and progressively builds more sophisticated architectures by topological and parametrical mutation and crossover, and selection of the fittest individuals concerning multiple objectives. We tested NEMODPS for the problem of designing the control policy of a multipurpose water system. Numerical results show that the tradeoff-dynamic structural and parametrical policy search of NEMODPS is consistent across multiple runs, and outperforms the solutions designed via traditional DPS with predefined policy topologies.