Multi-modal depression detection based on emotional audio and evaluation text

J Affect Disord. 2021 Dec 1:295:904-913. doi: 10.1016/j.jad.2021.08.090. Epub 2021 Sep 2.

Abstract

Background: Early detection of depression is very important for the treatment of patients. In view of the current inefficient screening methods for depression, the research of depression identification technology is a complex problem with application value.

Methods: Our research propose a new experimental method for depression detection based on audio and text. 160 Chinese subjects are investigated in this study. It is worth noting that we propose a text reading experiment to make subjects emotions change rapidly. It will be called Segmental Emotional Speech Experiment (SESE) below. We extract 384-dimensional Low-level audio features to find the differences of different emotional change in SESE. At the same time, our research propose a multi-modal fusion method based on DeepSpectrum features and word vector features to detect depression by using deep learning.

Results: Our experiment proved that SESE can improve the recognition accuracy of depression and found differences in Low-level audio features. Case group and Control group, gender and age are grouped for verification. It is also satisfactory that the multi-modal fusion model achieves accuracy of 0.912 and F1 score of 0.906.

Conclusions: Our contribution is twofold. First, we propose and verify SESE, which can provide a new experimental idea for the follow-up researchers. Secondly, a new efficient multi-modal depression recognition model is proposed.

Keywords: Artificial intelligence; Deep learning; Depression; Multi-modality.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Depression* / diagnosis
  • Emotions
  • Humans
  • Speech*