Fight Recognition in video using Hough Forests and 2D Convolutional Neural Network

IEEE Trans Image Process. 2018 Oct;27(10):4787-4797. doi: 10.1109/TIP.2018.2845742. Epub 2018 Jun 8.

Abstract

While action recognition has become an important line of research in computer vision, the recognition of particular events such as aggressive behaviors, or fights, has been relatively less studied. These tasks may be extremely useful in several video surveillance scenarios such as psychiatric wards, prisons or even in personal camera smartphones. Their potential usability has led to a surge of interest in developing fight or violence detectors. One of the key aspects in this case is efficiency, that is, these methods should be computationally fast. "Handcrafted" spatiotemporal features that account for both motion and appearance information can achieve high accuracy rates, albeit the computational cost of extracting some of those features is still prohibitive for practical applications. The deep learning paradigm has been recently applied for the first time to this task too, in the form of a 3D Convolutional Neural Network that processes the whole video sequence as input. However, results in human perception of other's actions suggest that, in this specific task, motion features are crucial. This means that using the whole video as input may add both redundancy and noise in the learning process. In this work, we propose a hybrid "handcrafted/learned" feature framework which provides better accuracy than the previous feature learning method, with similar computational efficiency. The proposed method is compared to three related benchmark datasets. The method outperforms the different state-of-the-art methods in two of the three considered benchmark datasets.

MeSH terms

  • Human Activities / classification*
  • Humans
  • Image Processing, Computer-Assisted / methods
  • Neural Networks, Computer*
  • Pattern Recognition, Automated / methods*
  • Video Recording
  • Violence / classification*