Geometry-Guided Dense Perspective Network for Speech-Driven Facial Animation

Jingying Liu; Binyuan Hui; Kun Li; Yunke Liu; Yu-Kun Lai; Yuxiang Zhang; Yebin Liu; Jingyu Yang

doi:10.1109/TVCG.2021.3107669

Geometry-Guided Dense Perspective Network for Speech-Driven Facial Animation

IEEE Trans Vis Comput Graph. 2022 Dec;28(12):4873-4886. doi: 10.1109/TVCG.2021.3107669. Epub 2022 Oct 26.

Authors

Jingying Liu, Binyuan Hui, Kun Li, Yunke Liu, Yu-Kun Lai, Yuxiang Zhang, Yebin Liu, Jingyu Yang

PMID: 34449390
DOI: 10.1109/TVCG.2021.3107669

Abstract

Realistic speech-driven 3D facial animation is a challenging problem due to the complex relationship between speech and face. In this paper, we propose a deep architecture, called Geometry-guided Dense Perspective Network (GDPnet), to achieve speaker-independent realistic 3D facial animation. The encoder is designed with dense connections to strengthen feature propagation and encourage the re-use of audio features, and the decoder is integrated with an attention mechanism to adaptively recalibrate point-wise feature responses by explicitly modeling interdependencies between different neuron units. We also introduce a non-linear face reconstruction representation as a guidance of latent space to obtain more accurate deformation, which helps solve the geometry-related deformation and is good for generalization across subjects. Huber and HSIC (Hilbert-Schmidt Independence Criterion) constraints are adopted to promote the robustness of our model and to better exploit the non-linear and high-order correlations. Experimental results on the public dataset and real scanned dataset validate the superiority of our proposed GDPnet compared with state-of-the-art model. The code is available for research purposes at http://cic.tju.edu.cn/faculty/likun/projects/GDPnet.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Computer Graphics
Face / diagnostic imaging
Face / physiology
Humans
Imaging, Three-Dimensional* / methods
Speech* / physiology