Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition

Md Azher Uddin; Young-Koo Lee

doi:10.3390/s19071599

Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition

Sensors (Basel). 2019 Apr 2;19(7):1599. doi: 10.3390/s19071599.

Authors

Md Azher Uddin¹, Young-Koo Lee²

Affiliations

¹ Department of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin 17104, Korea. azher006@yahoo.com.
² Department of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin 17104, Korea. yklee@khu.ac.kr.

Abstract

Human action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spatial-temporal information plays a vital role to describe a video. In this research, we propose a novel approach to recognize human actions by considering both deep spatial features and handcrafted spatiotemporal features. Firstly, we extract the deep spatial features by employing a state-of-the-art deep convolutional network, namely Inception-Resnet-v2. Secondly, we introduce a novel handcrafted feature descriptor, namely Weber's law based Volume Local Gradient Ternary Pattern (WVLGTP), which brings out the spatiotemporal features. It also considers the shape information by using gradient operation. Furthermore, Weber's law based threshold value and the ternary pattern based on an adaptive local threshold is presented to effectively handle the noisy center pixel value. Besides, a multi-resolution approach for WVLGTP based on an averaging scheme is also presented. Afterward, both these extracted features are concatenated and feed to the Support Vector Machine to perform the classification. Lastly, the extensive experimental analysis shows that our proposed method outperforms state-of-the-art approaches in terms of accuracy.

Keywords: Inception-Resnet-v2; Weber’s law based volume local gradient ternary pattern; deep spatial features; spatiotemporal features.

MeSH terms

Algorithms
Biosensing Techniques*
Human Activities*
Humans
Image Processing, Computer-Assisted / methods*
Monitoring, Physiologic
Neural Networks, Computer
Pattern Recognition, Automated / methods
Support Vector Machine

Grants and funding

No. 2016-0-00406, SIAT CCTV Cloud Platform/This work was supported by Institute for Information and communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. 2016-0-00406, SIAT CCTV Cloud Platform).