Efficient Multitask Reinforcement Learning Without Performance Loss

IEEE Trans Neural Netw Learn Syst. 2023 Jun 7:PP. doi: 10.1109/TNNLS.2023.3281473. Online ahead of print.

Abstract

We propose an iterative sparse Bayesian policy optimization (ISBPO) scheme as an efficient multitask reinforcement learning (RL) method for industrial control applications that require both high performance and cost-effective implementation. Under continual learning scenarios in which multiple control tasks are sequentially learned, the proposed ISBPO scheme preserves the previously learned knowledge without performance loss (PL), enables efficient resource use, and improves the sample efficiency of learning new tasks. Specifically, the proposed ISBPO scheme continually adds new tasks to a single policy neural network while completely preserving the control performance of previously learned tasks through an iterative pruning method. To create a free-weight space for adding new tasks, each task is learned through a pruning-aware policy optimization method called the sparse Bayesian policy optimization (SBPO), which ensures efficient allocation of limited policy network resources for multiple tasks. Furthermore, the weights allocated to the previous tasks are shared and reused in new task learning, thereby improving sample efficiency and the performance of new task learning. Simulations and practical experiments demonstrate that the proposed ISBPO scheme is highly suitable for sequentially learning multiple tasks in terms of performance conservation, efficient resource use, and sample efficiency.