Anomaly Detection and Correction of Optimizing Autonomous Systems With Inverse Reinforcement Learning

Bosen Lian; Yusuf Kartal; Frank L Lewis; Dariusz G Mikulski; Gregory R Hudas; Yan Wan; Ali Davoudi

doi:10.1109/TCYB.2022.3213526

Anomaly Detection and Correction of Optimizing Autonomous Systems With Inverse Reinforcement Learning

IEEE Trans Cybern. 2023 Jul;53(7):4555-4566. doi: 10.1109/TCYB.2022.3213526. Epub 2023 Jun 15.

Authors

Bosen Lian, Yusuf Kartal, Frank L Lewis, Dariusz G Mikulski, Gregory R Hudas, Yan Wan, Ali Davoudi

PMID: 36264741
DOI: 10.1109/TCYB.2022.3213526

Abstract

This article considers autonomous systems whose behaviors seek to optimize an objective function. This goes beyond standard applications of condition-based maintenance, which seeks to detect faults or failures in nonoptimizing systems. Normal agents optimize a known accepted objective function, whereas abnormal or misbehaving agents may optimize a renegade objective that does not conform to the accepted one. We provide a unified framework for anomaly detection and correction in optimizing autonomous systems described by differential equations using inverse reinforcement learning (RL). We first define several types of anomalies and false alarms, including noise anomaly, objective function anomaly, intention (control gain) anomaly, abnormal behaviors, noise-anomaly false alarms, and objective false alarms. We then propose model-free inverse RL algorithms to reconstruct the objective functions and intentions for given system behaviors. The inverse RL procedure for anomaly detection and correction has the training phase, detection phase, and correction phase. First, inverse RL in the training phase infers the objective function and intention of the normal behavior system using offline stored data. Second, in the detection phase, inverse RL infers the objective function and intention for online observed test system behaviors using online observation data. They are then compared with that of the nominal system to identify anomalies. Third, correction is executed for the anomalous system to learn the normal objective and intention. Simulations and experiments on a quadrotor unmanned aerial vehicle (UAV) verify the proposed methods.

MeSH terms

Algorithms
Learning*
Reinforcement, Psychology*