Weakly Supervised Violence Detection in Surveillance Video

David Choqueluque-Roman; Guillermo Camara-Chavez

doi:10.3390/s22124502

Weakly Supervised Violence Detection in Surveillance Video

Sensors (Basel). 2022 Jun 14;22(12):4502. doi: 10.3390/s22124502.

Authors

David Choqueluque-Roman¹, Guillermo Camara-Chavez²

Affiliations

¹ Department of Computer Science, Universidad Católica San Pablo, Arequipa 04001, Peru.
² Department of Computer Science, Federal University of Ouro Preto, Ouro Preto 35400-000, Brazil.

Abstract

Automatic violence detection in video surveillance is essential for social and personal security. Monitoring the large number of surveillance cameras used in public and private areas is challenging for human operators. The manual nature of this task significantly increases the possibility of ignoring important events due to human limitations when paying attention to multiple targets at a time. Researchers have proposed several methods to detect violent events automatically to overcome this problem. So far, most previous studies have focused only on classifying short clips without performing spatial localization. In this work, we tackle this problem by proposing a weakly supervised method to detect spatially and temporarily violent actions in surveillance videos using only video-level labels. The proposed method follows a Fast-RCNN style architecture, that has been temporally extended. First, we generate spatiotemporal proposals (action tubes) leveraging pre-trained person detectors, motion appearance (dynamic images), and tracking algorithms. Then, given an input video and the action proposals, we extract spatiotemporal features using deep neural networks. Finally, a classifier based on multiple-instance learning is trained to label each action tube as violent or non-violent. We obtain similar results to the state of the art in three public databases Hockey Fight, RLVSD, and RWF-2000, achieving an accuracy of 97.3%, 92.88%, 88.7%, respectively.

Keywords: dynamic image; spatiotemporal violence detection; video surveillance; violence detection; weakly supervised.

MeSH terms

Algorithms
Humans
Motion
Neural Networks, Computer*
Pattern Recognition, Automated* / methods
Violence

Grants and funding

8682-PE/Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica