Masked Kinematic Continuity-aware Hierarchical Attention Network for pose estimation in videos

Neural Netw. 2024 Jan:169:282-292. doi: 10.1016/j.neunet.2023.10.038. Epub 2023 Oct 27.

Abstract

Existing methods for estimating human poses from video content exploit the temporal features of the video sequences and have shown impressive results. However, most methods address spatiotemporal issues separately. They compromise on accuracy to reduce jitter, or require high-resolution images to deal with occlusion, preventing full consideration of temporal features. Unfortunately, these two issues are interrelated. For example, occlusion causes uncertainty between successive frames, leading to unsmoothed results. To address these issues, we propose the Masked Kinematic Continuity-aware Hierarchical Attention Network (M-HANet) as a novel framework that exploits masked kinematic keypoint features by extending our framework HANet framework. First, we randomly select and mask a keypoint to treat the masked keypoint as it is occluded, which allows us to make the network resilient to occlusion. We also use the velocity and acceleration of each individual keypoint to effectively capture temporal features. Second, the proposed hierarchical transformer encoder refines a 2D or 3D input pose derived from existing estimators by aggregating the masked continuity of the spatiotemporal dependencies of human motion. Finally, to facilitate collaborative optimization, we perform an online cross-supervision between the final pose from our decoder and the refined input pose produced by our encoder. We validate the effectiveness of our model demonstrating that our proposed approach improves PCK@0.05 by 14.1% and MPJPE by 8.7 mm compared to the existing method on a variety of tasks, including 2D and 3D pose estimation, body mesh recovery, and sparsely annotated multi-human pose estimation.

Keywords: Body mesh recovery; Pose estimation; Transformer; Video understanding.

MeSH terms

  • Biomechanical Phenomena
  • Humans
  • Motion
  • Resilience, Psychological*
  • Uncertainty