Multimodal mental health assessment with remote interviews using facial, vocal, linguistic, and cardiovascular patterns

medRxiv [Preprint]. 2023 Sep 12:2023.09.11.23295212. doi: 10.1101/2023.09.11.23295212.

Abstract

Objective: The current clinical practice of psychiatric evaluation suffers from subjectivity and bias, and requires highly skilled professionals that are often unavailable or unaffordable. Objective digital biomarkers have shown the potential to address these issues. In this work, we investigated whether behavioral and physiological signals, extracted from remote interviews, provided complimentary information for assessing psychiatric disorders.

Methods: Time series of multimodal features were derived from four conceptual modes: facial expression, vocal expression, linguistic expression, and cardiovascular modulation. The features were extracted from simultaneously recorded audio and video of remote interviews using task-specific and foundation models. Averages, standard deviations, and hidden Markov model-derived statistics of these features were computed from 73 subjects. Four binary classification tasks were defined: detecting 1) any clinically-diagnosed psychiatric disorder, 2) major depressive disorder, 3) self-rated depression, and 4) self-rated anxiety. Each modality was evaluated individually and in combination.

Results: Statistically significant feature differences were found between controls and subjects with mental health conditions. Correlations were found between features and self-rated depression and anxiety scores. Visual heart rate dynamics achieved the best unimodal performance with areas under the receiver-operator curve (AUROCs) of 0.68-0.75 (depending on the classification task). Combining multiple modalities achieved AUROCs of 0.72-0.82. Features from task-specific models outperformed features from foundation models.

Conclusion: Multimodal features extracted from remote interviews revealed informative characteristics of clinically diagnosed and self-rated mental health status.

Significance: The proposed multimodal approach has the potential to facilitate objective, remote, and low-cost assessment for low-burden automated mental health services.

Publication types

  • Preprint