Sample Fusion Network: An End-to-End Data Augmentation Network for Skeleton-Based Human Action Recognition

IEEE Trans Image Process. 2019 Nov;28(11):5281-5295. doi: 10.1109/TIP.2019.2913544. Epub 2019 May 2.

Abstract

Data augmentation is a widely used technique for enhancing the generalization ability of deep neural networks for skeleton-based human action recognition (HAR) tasks. Most existing data augmentation methods generate new samples by means of handcrafted transforms. However, these methods often cannot be trained and then are discarded during testing because of the lack of learnable parameters. To solve those problems, a novel type of data augmentation network called a sample fusion network (SFN) is proposed. Instead of using handcrafted transforms, an SFN generates new samples via a long short-term memory (LSTM) autoencoder (AE) network. Therefore, an SFN and HAR network can be cascaded together to form a combined network that can be trained in an end-to-end manner. Moreover, an adaptive weighting strategy is employed to improve the complementarity between a sample and the new sample generated from it by an SFN, thus allowing the SFN to more efficiently improve the performance of the HAR network during testing. The experimental results on various datasets verify that the proposed method outperforms state-of-the-art data augmentation methods. More importantly, the proposed SFN architecture is a general framework that can be integrated with various types of networks for HAR. For example, when a baseline HAR model with three LSTM layers and one fully connected (FC) layer was used, the classification accuracy was increased from 79.53% to 90.75% on the NTU RGB+D dataset using a cross-view protocol, thus outperforming most other methods.

MeSH terms

  • Algorithms
  • Databases, Factual
  • Human Activities / classification*
  • Humans
  • Image Processing, Computer-Assisted / methods*
  • Neural Networks, Computer*
  • Video Recording