A Sampling Approach to Generating Closely Interacting 3D Pose-Pairs from 2D Annotations

Kangxue Yin; Hui Huang; Edmond S L Ho; Hao Wang; Taku Komura; Daniel Cohen-Or; Hao Zhang

doi:10.1109/TVCG.2018.2832097

A Sampling Approach to Generating Closely Interacting 3D Pose-Pairs from 2D Annotations

IEEE Trans Vis Comput Graph. 2019 Jun;25(6):2217-2227. doi: 10.1109/TVCG.2018.2832097. Epub 2018 May 1.

Authors

Kangxue Yin, Hui Huang, Edmond S L Ho, Hao Wang, Taku Komura, Daniel Cohen-Or, Hao Zhang

PMID: 29994049
DOI: 10.1109/TVCG.2018.2832097

Abstract

We introduce a data-driven method to generate a large number of plausible, closely interacting 3D human pose-pairs, for a given motion category, e.g., wrestling or salsa dance. With much difficulty in acquiring close interactions using 3D sensors, our approach utilizes abundant existing video data which cover many human activities. Instead of treating the data generation problem as one of reconstruction, either through 3D acquisition or direct 2D-to-3D data lifting from video annotations, we present a solution based on Markov Chain Monte Carlo (MCMC) sampling. Given a motion category and a set of video frames depicting the motion with the 2D pose-pair in each frame annotated, we start the sampling with one or few seed 3D pose-pairs which are manually created based on the target motion category. The initial set is then augmented by MCMC sampling around the seeds, via the Metropolis-Hastings algorithm and guided by a probability density function (PDF) that is defined by two terms to bias the sampling towards 3D pose-pairs that are physically valid and plausible for the motion category. With a focus on efficient sampling over the space of close interactions, rather than pose spaces, we develop a novel representation called interaction coordinates (IC) to encode both poses and their interactions in an integrated manner. Plausibility of a 3D pose-pair is then defined based on the IC and with respect to the annotated 2D pose-pairs from video. We show that our sampling-based approach is able to efficiently synthesize a large volume of plausible, closely interacting 3D pose-pairs which provide a good coverage of the input 2D pose-pairs.