Real-Time Multi-Person Video Synthesis with Controllable Prior-Guided Matting

Aoran Chen; Hai Huang; Yueyan Zhu; Junsheng Xue

doi:10.3390/s24092795

Real-Time Multi-Person Video Synthesis with Controllable Prior-Guided Matting

Sensors (Basel). 2024 Apr 27;24(9):2795. doi: 10.3390/s24092795.

Authors

Aoran Chen¹, Hai Huang¹, Yueyan Zhu¹, Junsheng Xue¹

Affiliation

¹ School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China.

Abstract

In order to enhance the matting performance in multi-person dynamic scenarios, we introduce a robust, real-time, high-resolution, and controllable human video matting method that achieves state of the art on all metrics. Unlike most existing methods that perform video matting frame by frame as independent images, we design a unified architecture using a controllable generation model to solve the problem of the lack of overall semantic information in multi-person video. Our method, called ControlMatting, uses an independent recurrent architecture to exploit temporal information in videos and achieves significant improvements in temporal coherence and detailed matting quality. ControlMatting adopts a mixed training strategy comprised of matting and a semantic segmentation dataset, which effectively improves the semantic understanding ability of the model. Furthermore, we propose a novel deep learning-based image filter algorithm that enforces our detailed augmentation ability on both matting and segmentation objectives. Our experiments have proved that prior information about the human body from the image itself can effectively combat the defect masking problem caused by complex dynamic scenarios with multiple people.

Keywords: controllable information; deep guided filter; deep learning; video matting.

Grants and funding

2021YFF0900700/National Key R&D Program of China