Spatio-Temporal Calibration of Multiple Kinect Cameras Using 3D Human Pose

Sensors (Basel). 2022 Nov 17;22(22):8900. doi: 10.3390/s22228900.

Abstract

RGB and depth cameras are extensively used for the 3D tracking of human pose and motion. Typically, these cameras calculate a set of 3D points representing the human body as a skeletal structure. The tracking capabilities of a single camera are often affected by noise and inaccuracies due to occluded body parts. Multiple-camera setups offer a solution to maximize coverage of the captured human body and to minimize occlusions. According to best practices, fusing information across multiple cameras typically requires spatio-temporal calibration. First, the cameras must synchronize their internal clocks. This is typically performed by physically connecting the cameras to each other using an external device or cable. Second, the pose of each camera relative to the other cameras must be calculated (Extrinsic Calibration). The state-of-the-art methods use specialized calibration session and devices such as a checkerboard to perform calibration. In this paper, we introduce an approach to the spatio-temporal calibration of multiple cameras which is designed to run on-the-fly without specialized devices or equipment requiring only the motion of the human body in the scene. As an example, the system is implemented and evaluated using Microsoft Azure Kinect. The study shows that the accuracy and robustness of this approach is on par with the state-of-the-art practices.

Keywords: 3D human pose estimation; Azure Kinect; depth sensor; extrinsic calibration; motion capture; multiple-camera setup; synchronization.

MeSH terms

  • Calibration*
  • Humans
  • Motion

Grants and funding

This research received no external funding.