Whole and Part Adaptive Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition

Sensors (Basel). 2020 Dec 13;20(24):7149. doi: 10.3390/s20247149.

Abstract

Spatiotemporal graph convolution has made significant progress in skeleton-based action recognition in recent years. Most of the existing graph convolution methods take all the joints of the human skeleton as the overall modeling graph, ignoring the differences in the movement patterns of various parts of the human, and cannot well connect the relationship between the different parts of the human skeleton. To capture the unique features of different parts of human skeleton data and the correlation of different parts, we propose two new graph convolution methods: the whole graph convolution network (WGCN) and the part graph convolution network (PGCN). WGCN learns the whole scale skeleton spatiotemporal features according to the movement patterns and physical structure of the human skeleton. PGCN divides the human skeleton graph into several subgraphs to learn the part scale spatiotemporal features. Moreover, we propose an adaptive fusion module that combines the two features for multiple complementary adaptive fusion to obtain more effective skeleton features. By coupling these proposals, we build a whole and part adaptive fusion graph convolution neural network (WPGCN) that outperforms previous state-of-the-art methods on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.

Keywords: graph convolutional network; skeleton-based human action recognition; whole and part adaptive fusion.

MeSH terms

  • Humans
  • Neural Networks, Computer*
  • Skeleton*