An enhanced real-time human pose estimation method based on modified YOLOv8 framework

Chengang Dong; Guodong Du

doi:10.1038/s41598-024-58146-z

An enhanced real-time human pose estimation method based on modified YOLOv8 framework

Sci Rep. 2024 Apr 5;14(1):8012. doi: 10.1038/s41598-024-58146-z.

Authors

Chengang Dong¹, Guodong Du²

Affiliations

¹ Nanjing University of Aeronautics and Astronautics, Nanjing, 210000, Jiangsu, China.
² Nanjing University of Aeronautics and Astronautics, Nanjing, 210000, Jiangsu, China. andrew_du@foxmail.com.

Abstract

The objective of human pose estimation (HPE) derived from deep learning aims to accurately estimate and predict the human body posture in images or videos via the utilization of deep neural networks. However, the accuracy of real-time HPE tasks is still to be improved due to factors such as partial occlusion of body parts and limited receptive field of the model. To alleviate the accuracy loss caused by these issues, this paper proposes a real-time HPE model called $CCAM - Person$ based on the YOLOv8 framework. Specifically, we have improved the backbone and neck of the YOLOv8x-pose real-time HPE model to alleviate the feature loss and receptive field constraints. Secondly, we introduce the context coordinate attention module (CCAM) to augment the model's focus on salient features, reduce background noise interference, alleviate key point regression failure caused by limb occlusion, and improve the accuracy of pose estimation. Our approach attains competitive results on multiple metrics of two open-source datasets, MS COCO 2017 and CrowdPose. Compared with the baseline model YOLOv8x-pose, CCAM-Person improves the average precision by 2.8% and 3.5% on the two datasets, respectively.

Keywords: Attention mechanisms; Deep learning; Feature pyramid network; Human pose estimation; YOLOv8.

MeSH terms

Benchmarking*
Extremities*
Humans
Neural Networks, Computer
Posture
Videotape Recording