Explainable Depression Detection Based on Facial Expression Using LSTM on Attentional Intermediate Feature Fusion with Label Smoothing

Yanisa Mahayossanunt; Natawut Nupairoj; Solaphat Hemrungrojn; Peerapon Vateekul

doi:10.3390/s23239402

Explainable Depression Detection Based on Facial Expression Using LSTM on Attentional Intermediate Feature Fusion with Label Smoothing

Sensors (Basel). 2023 Nov 25;23(23):9402. doi: 10.3390/s23239402.

Authors

Yanisa Mahayossanunt¹, Natawut Nupairoj^{1

2}, Solaphat Hemrungrojn^{2

3

4}, Peerapon Vateekul^{1

2

4}

Affiliations

¹ Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Phayathai Rd, Pathumwan, Bangkok 10330, Thailand.
² Center of Excellence in Digital and AI Innovation for Mental Health (AIMET), Chulalongkorn Unversity, Phayathai Rd, Pathumwan, Bangkok 10330, Thailand.
³ Department of Psychiatry, Faculty of Medicine, Chulalongkorn University, Phayathai Rd, Pathumwan, Bangkok 10330, Thailand.
⁴ Cognitive Fitness and Biopsychiatry Technology Research Unit, Faculty of Medicine, Chulalongkorn University, Phayathai Rd, Pathumwan, Bangkok 10330, Thailand.

Abstract

Machine learning is used for a fast pre-diagnosis approach to prevent the effects of Major Depressive Disorder (MDD). The objective of this research is to detect depression using a set of important facial features extracted from interview video, e.g., radians, gaze at angles, action unit intensity, etc. The model is based on LSTM with an attention mechanism. It aims to combine those features using the intermediate fusion approach. The label smoothing was presented to further improve the model's performance. Unlike other black-box models, the integrated gradient was presented as the model explanation to show important features of each patient. The experiment was conducted on 474 video samples collected at Chulalongkorn University. The data set was divided into 134 depressed and 340 non-depressed categories. The results showed that our model is the winner, with a 88.89% F1-score, 87.03% recall, 91.67% accuracy, and 91.40% precision. Moreover, the model can capture important features of depression, including head turning, no specific gaze, slow eye movement, no smiles, frowning, grumbling, and scowling, which express a lack of concentration, social disinterest, and negative feelings that are consistent with the assumptions in the depressive theories.

Keywords: deep learning; depression detection; facial expression.

MeSH terms

Depression / diagnosis
Depressive Disorder, Major* / diagnosis
Emotions
Eye Movements
Facial Expression*
Humans

Grants and funding

This research received no external funding.