Keys for Action: An Efficient Keyframe-Based Approach for 3D Action Recognition Using a Deep Neural Network

Hashim Yasin; Mazhar Hussain; Andreas Weber

doi:10.3390/s20082226

Keys for Action: An Efficient Keyframe-Based Approach for 3D Action Recognition Using a Deep Neural Network

Sensors (Basel). 2020 Apr 15;20(8):2226. doi: 10.3390/s20082226.

Authors

Hashim Yasin¹, Mazhar Hussain¹, Andreas Weber²

Affiliations

¹ Department of Computer Science, National University of Computer and Emerging Sciences, Islamabad 44000, Pakistan.
² Department of Computer Science II, Universität Bonn, 53115 Bonn, Germany.

Abstract

In this paper, we propose a novel and efficient framework for 3D action recognition using a deep learning architecture. First, we develop a 3D normalized pose space that consists of only 3D normalized poses, which are generated by discarding translation and orientation information. From these poses, we extract joint features and employ them further in a Deep Neural Network (DNN) in order to learn the action model. The architecture of our DNN consists of two hidden layers with the sigmoid activation function and an output layer with the softmax function. Furthermore, we propose a keyframe extraction methodology through which, from a motion sequence of 3D frames, we efficiently extract the keyframes that contribute substantially to the performance of the action. In this way, we eliminate redundant frames and reduce the length of the motion. More precisely, we ultimately summarize the motion sequence, while preserving the original motion semantics. We only consider the remaining essential informative frames in the process of action recognition, and the proposed pipeline is sufficiently fast and robust as a result. Finally, we evaluate our proposed framework intensively on publicly available benchmark Motion Capture (MoCap) datasets, namely HDM05 and CMU. From our experiments, we reveal that our proposed scheme significantly outperforms other state-of-the-art approaches.

Keywords: action recognition; deep neural network (DNN); keyframe extraction; motion capture (MoCap) datasets.