Intelligent Complementary Multi-Modal Fusion for Anomaly Surveillance and Security System

Sensors (Basel). 2023 Nov 16;23(22):9214. doi: 10.3390/s23229214.

Abstract

Recently, security monitoring facilities have mainly adopted artificial intelligence (AI) technology to provide both increased security and improved performance. However, there are technical challenges in the pursuit of elevating system performance, automation, and security efficiency. In this paper, we proposed intelligent anomaly detection and classification based on deep learning (DL) using multi-modal fusion. To verify the method, we combined two DL-based schemes, such as (i) the 3D Convolutional AutoEncoder (3D-AE) for anomaly detection and (ii) the SlowFast neural network for anomaly classification. The 3D-AE can detect occurrence points of abnormal events and generate regions of interest (ROI) by the points. The SlowFast model can classify abnormal events using the ROI. These multi-modal approaches can complement weaknesses and leverage strengths in the existing security system. To enhance anomaly learning effectiveness, we also attempted to create a new dataset using the virtual environment in Grand Theft Auto 5 (GTA5). The dataset consists of 400 abnormal-state data and 78 normal-state data with clip sizes in the 8-20 s range. Virtual data collection can also supplement the original dataset, as replicating abnormal states in the real world is challenging. Consequently, the proposed method can achieve a classification accuracy of 85%, which is higher compared to the 77.5% accuracy achieved when only employing the single classification model. Furthermore, we validated the trained model with the GTA dataset by using a real-world assault class dataset, consisting of 1300 instances that we reproduced. As a result, 1100 data as the assault were classified and achieved 83.5% accuracy. This also shows that the proposed method can provide high performance in real-world environments.

Keywords: 3D convolutional autoencoder; GTA dataset; anomaly classification; anomaly detection; multi-modal; slowfast; surveillance and security.