Human Pose Estimation from Monocular Images: A Comprehensive Survey

Wenjuan Gong; Xuena Zhang; Jordi Gonzàlez; Andrews Sobral; Thierry Bouwmans; Changhe Tu; El-Hadi Zahzah

doi:10.3390/s16121966

Human Pose Estimation from Monocular Images: A Comprehensive Survey

Sensors (Basel). 2016 Nov 25;16(12):1966. doi: 10.3390/s16121966.

Authors

Wenjuan Gong¹, Xuena Zhang², Jordi Gonzàlez³, Andrews Sobral^{4

5}, Thierry Bouwmans⁶, Changhe Tu⁷, El-Hadi Zahzah⁸

Affiliations

¹ Department of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China. wenjuangong@upc.edu.cn.
² Department of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China. xuena_zhanghh@163.com.
³ Computer Vision Center, University Autònoma de Barcelona, 08193 Catalonia, Spain. poal@cvc.uab.es.
⁴ Laboratory MIA, University of La Rochelle, 17042 La Rochelle CEDEX, France. andrews.sobral@univ-lr.fr.
⁵ Laboratory L3i, University of La Rochelle, 17042 La Rochelle CEDEX, France. andrews.sobral@univ-lr.fr.
⁶ Laboratory MIA, University of La Rochelle, 17042 La Rochelle CEDEX, France. thierry.bouwmans@univ-lr.fr.
⁷ School of Computer Science and Technology, Shandong University, Jinan 250100, China. chtu@sdu.edu.cn.
⁸ Laboratory L3i, University of La Rochelle, 17042 La Rochelle CEDEX, France. ezahzah@univ-lr.fr.

Abstract

Human pose estimation refers to the estimation of the location of body parts and how they are connected in an image. Human pose estimation from monocular images has wide applications (e.g., image indexing). Several surveys on human pose estimation can be found in the literature, but they focus on a certain category; for example, model-based approaches or human motion analysis, etc. As far as we know, an overall review of this problem domain has yet to be provided. Furthermore, recent advancements based on deep learning have brought novel algorithms for this problem. In this paper, a comprehensive survey of human pose estimation from monocular images is carried out including milestone works and recent advancements. Based on one standard pipeline for the solution of computer vision problems, this survey splits the problem into several modules: feature extraction and description, human body models, and modeling methods. Problem modeling methods are approached based on two means of categorization in this survey. One way to categorize includes top-down and bottom-up methods, and another way includes generative and discriminative methods. Considering the fact that one direct application of human pose estimation is to provide initialization for automatic video surveillance, there are additional sections for motion-related methods in all modules: motion features, motion models, and motion-based methods. Finally, the paper also collects 26 publicly available data sets for validation and provides error measurement methods that are frequently used.

Keywords: bottom-up methods; discriminative methods; generative methods; human body models; human pose estimation; top-down methods.

MeSH terms

Algorithms
Humans
Image Processing, Computer-Assisted
Pattern Recognition, Automated
Posture / physiology*