FreeGaze: A Framework for 3D Gaze Estimation Using Appearance Cues from a Facial Video

Sensors (Basel). 2023 Dec 4;23(23):9604. doi: 10.3390/s23239604.

Abstract

Gaze is a significant behavioral characteristic that can be used to reflect a person's attention. In recent years, there has been a growing interest in estimating gaze from facial videos. However, gaze estimation remains a challenging problem due to variations in appearance and head poses. To address this, a framework for 3D gaze estimation using appearance cues is developed in this study. The framework begins with an end-to-end approach to detect facial landmarks. Subsequently, we employ a normalization method and improve the normalization method using orthogonal matrices and conduct comparative experiments to prove that the improved normalization method has a higher accuracy and a lower computational time in gaze estimation. Finally, we introduce a dual-branch convolutional neural network, named FG-Net, which processes the normalized images and extracts eye and face features through two branches. The extracted multi-features are then integrated and input into a fully connected layer to estimate the 3D gaze vectors. To evaluate the performance of our approach, we conduct ten-fold cross-validation experiments on two public datasets, namely MPIIGaze and EyeDiap, achieving remarkable accuracies of 3.11° and 2.75°, respectively. The results demonstrate the high effectiveness of our proposed framework, showcasing its state-of-the-art performance in 3D gaze estimation.

Keywords: dual-branch CNN; eye features; face features; gaze estimation; improved normalization.

MeSH terms

  • Attention
  • Cues*
  • Face*
  • Humans
  • Neural Networks, Computer

Grants and funding

This research received no external funding.