Cognitive Refined Augmentation for Video Anomaly Detection in Weak Supervision

Junyeop Lee; Hyunbon Koo; Seongjun Kim; Hanseok Ko

doi:10.3390/s24010058

Cognitive Refined Augmentation for Video Anomaly Detection in Weak Supervision

Sensors (Basel). 2023 Dec 21;24(1):58. doi: 10.3390/s24010058.

Authors

Junyeop Lee¹, Hyunbon Koo², Seongjun Kim², Hanseok Ko¹

Affiliations

¹ School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea.
² Korea Institute of Civil Engineering and Building Technology, Goyang-si 10223, Republic of Korea.

Abstract

Weakly supervised video anomaly detection is a methodology that assesses anomaly levels in individual frames based on labeled video data. Anomaly scores are computed by evaluating the deviation of distances derived from frames in an unbiased state. Weakly supervised video anomaly detection encounters the formidable challenge of false alarms, stemming from various sources, with a major contributor being the inadequate reflection of frame labels during the learning process. Multiple instance learning has been a pivotal solution to this issue in previous studies, necessitating the identification of discernible features between abnormal and normal segments. Simultaneously, it is imperative to identify shared biases within the feature space and cultivate a representative model. In this study, we introduce a novel multiple instance learning framework anchored on a memory unit, which augments features based on memory and effectively bridges the gap between normal and abnormal instances. This augmentation is facilitated through the integration of an multi-head attention feature augmentation module and loss function with a KL divergence and a Gaussian distribution estimation-based approach. The method identifies distinguishable features and secures the inter-instance distance, thus fortifying the distance metrics between abnormal and normal instances approximated by distribution. The contribution of this research involves proposing a novel framework based on MIL for performing WSVAD and presenting an efficient integration strategy during the augmentation process. Extensive experiments were conducted on benchmark datasets XD-Violence and UCF-Crime to substantiate the effectiveness of the proposed model.

Keywords: feature augmentation; multiple instance learning; weakly supervised video anomaly detection.

Grants and funding

20220238-001/Korea Institute of Civil Engineering and Building Technology