Self-Supervised Action Representation Learning Based on Asymmetric Skeleton Data Augmentation

Sensors (Basel). 2022 Nov 20;22(22):8989. doi: 10.3390/s22228989.

Abstract

Contrastive learning has received increasing attention in the field of skeleton-based action representations in recent years. Most contrastive learning methods use simple augmentation strategies to construct pairs of positive samples. When using such pairs of positive samples to learn action representations, deeper feature information cannot be learned, thus affecting the performance of downstream tasks. To solve the problem of insufficient learning ability, we propose an asymmetric data augmentation strategy and attempt to apply it to the training of 3D skeleton-based action representations. First, we carefully study the different characteristics presented by different skeleton views and choose a specific augmentation method for a certain view. Second, specific augmentation methods are incorporated into the left and right branches of the asymmetric data augmentation pipeline to increase the convergence difficulty of the contrastive learning task, thereby significantly improving the quality of the learned action representations. Finally, since many methods directly act on the joint view, the augmented samples are quite different from the original samples. We use random probability activation to transform the joint view to avoid extreme augmentation of the joint view. Extensive experiments on NTU RGB + D datasets show that our method is effective.

Keywords: action representation; contrastive learning; data augmentation; self-supervised.

MeSH terms

  • Learning
  • Machine Learning*
  • Problem-Based Learning*
  • Skeleton

Grants and funding

This research received no external funding.