A deep neural network model for multi-view human activity recognition

PLoS One. 2022 Jan 7;17(1):e0262181. doi: 10.1371/journal.pone.0262181. eCollection 2022.

Abstract

Multiple cameras are used to resolve occlusion problem that often occur in single-view human activity recognition. Based on the success of learning representation with deep neural networks (DNNs), recent works have proposed DNNs models to estimate human activity from multi-view inputs. However, currently available datasets are inadequate in training DNNs model to obtain high accuracy rate. Against such an issue, this study presents a DNNs model, trained by employing transfer learning and shared-weight techniques, to classify human activity from multiple cameras. The model comprised pre-trained convolutional neural networks (CNNs), attention layers, long short-term memory networks with residual learning (LSTMRes), and Softmax layers. The experimental results suggested that the proposed model could achieve a promising performance on challenging MVHAR datasets: IXMAS (97.27%) and i3DPost (96.87%). A competitive recognition rate was also observed in online classification.

MeSH terms

  • Attention / physiology*
  • Human Activities / statistics & numerical data*
  • Humans
  • Learning / physiology*
  • Memory / physiology*
  • Neural Networks, Computer*
  • Recognition, Psychology*

Grants and funding

This work was supported by JSPS Kakenhi under Grant No. 26285212 (K.S) and 18H01041 (K.Sh). JSPS: https://www.jsps.go.jp/.