Adversary Agnostic Robust Deep Reinforcement Learning

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):6146-6157. doi: 10.1109/TNNLS.2021.3133537. Epub 2023 Sep 1.

Abstract

Deep reinforcement learning (DRL) policies have been shown to be deceived by perturbations (e.g., random noise or intensional adversarial attacks) on state observations that appear at test time but are unknown during training. To increase the robustness of DRL policies, previous approaches assume that explicit adversarial information can be added into the training process, to achieve generalization ability on these perturbed observations as well. However, such approaches not only make robustness improvement more expensive but may also leave a model prone to other kinds of attacks in the wild. In contrast, we propose an adversary agnostic robust DRL paradigm that does not require learning from predefined adversaries. To this end, we first theoretically show that robustness could indeed be achieved independently of the adversaries based on a policy distillation (PD) setting. Motivated by this finding, we propose a new PD loss with two terms: 1) a prescription gap maximization (PGM) loss aiming to simultaneously maximize the likelihood of the action selected by the teacher policy and the entropy over the remaining actions and 2) a corresponding Jacobian regularization (JR) loss that minimizes the magnitude of gradients with respect to the input state. The theoretical analysis substantiates that our distillation loss guarantees to increase the prescription gap and hence improves the adversarial robustness. Furthermore, experiments on five Atari games firmly verify the superiority of our approach compared to the state-of-the-art baselines.