Deep Learning of Fuzzy Weighted Multi-Resolution Depth Motion Maps with Spatial Feature Fusion for Action Recognition

Mahmoud Al-Faris; John Chiverton; Yanyan Yang; David Ndzi

doi:10.3390/jimaging5100082

Deep Learning of Fuzzy Weighted Multi-Resolution Depth Motion Maps with Spatial Feature Fusion for Action Recognition

J Imaging. 2019 Oct 21;5(10):82. doi: 10.3390/jimaging5100082.

Authors

Mahmoud Al-Faris¹, John Chiverton¹, Yanyan Yang², David Ndzi³

Affiliations

¹ School of Energy and Electronic Engineering, University of Portsmouth, Portsmouth PO1 3DJ, UK.
² School of Computing, University of Portsmouth, Portsmouth PO1 3DJ, UK.
³ School of Computing, Engineering and Physical Sciences, University of the West of Scotland, Paisley PA1 2BE, UK.

Abstract

Human action recognition (HAR) is an important yet challenging task. This paper presents a novel method. First, fuzzy weight functions are used in computations of depth motion maps (DMMs). Multiple length motion information is also used. These features are referred to as fuzzy weighted multi-resolution DMMs (FWMDMMs). This formulation allows for various aspects of individual actions to be emphasized. It also helps to characterise the importance of the temporal dimension. This is important to help overcome, e.g., variations in time over which a single type of action might be performed. A deep convolutional neural network (CNN) motion model is created and trained to extract discriminative and compact features. Transfer learning is also used to extract spatial information from RGB and depth data using the AlexNet network. Different late fusion techniques are then investigated to fuse the deep motion model with the spatial network. The result is a spatial temporal HAR model. The developed approach is capable of recognising both human action and human-object interaction. Three public domain datasets are used to evaluate the proposed solution. The experimental results demonstrate the robustness of this approach compared with state-of-the art algorithms.

Keywords: action recognition; feature fusion; multi-resolution; transfer learning.

Grants and funding

2015/Higher Committee for Education Development in Iraq