A Driver Gaze Estimation Method Based on Deep Learning

Sensors (Basel). 2022 May 23;22(10):3959. doi: 10.3390/s22103959.

Abstract

Car crashes are among the top ten leading causes of death; they could mainly be attributed to distracted drivers. An advanced driver-assistance technique (ADAT) is a procedure that can notify the driver about a dangerous scenario, reduce traffic crashes, and improve road safety. The main contribution of this work involved utilizing the driver's attention to build an efficient ADAT. To obtain this "attention value", the gaze tracking method is proposed. The gaze direction of the driver is critical toward understanding/discerning fatal distractions, pertaining to when it is obligatory to notify the driver about the risks on the road. A real-time gaze tracking system is proposed in this paper for the development of an ADAT that obtains and communicates the gaze information of the driver. The developed ADAT system detects various head poses of the driver and estimates eye gaze directions, which play important roles in assisting the driver and avoiding any unwanted circumstances. The first (and more significant) task in this research work involved the development of a benchmark image dataset consisting of head poses and horizontal and vertical direction gazes of the driver's eyes. To detect the driver's face accurately and efficiently, the You Only Look Once (YOLO-V4) face detector was used by modifying it with the Inception-v3 CNN model for robust feature learning and improved face detection. Finally, transfer learning in the InceptionResNet-v2 CNN model was performed, where the CNN was used as a classification model for head pose detection and eye gaze angle estimation; a regression layer to the InceptionResNet-v2 CNN was added instead of SoftMax and the classification output layer. The proposed model detects and estimates head pose directions and eye directions with higher accuracy. The average accuracy achieved by the head pose detection system was 91%; the model achieved a RMSE of 2.68 for vertical and 3.61 for horizontal eye gaze estimations.

Keywords: CNN; Inception-v3; InceptionResNet-v2; You Only Look Once (YOLO); advanced driver-assistance technique.

MeSH terms

  • Deep Learning*
  • Eye
  • Eye Movements*
  • Fixation, Ocular
  • Head Movements

Grants and funding

This research was funded by the National Key Research and Development Program, grant numbers 2018YFB1600202 and 2021YFB1600205; the National Natural Science Foundation of China, grant number 52178407.