Adolescent Depression Detection Model Based on Multimodal Data of Interview Audio and Text

Int J Neural Syst. 2022 Nov;32(11):2250045. doi: 10.1142/S0129065722500459. Epub 2022 Aug 26.

Abstract

Depression is a common mental disease that has a tendency to develop at a younger age. Early detection of depression with psychological intervention may effectively prevent youth suicide. The establishment of the computer-aided model may be efficient for early detection. However, the existing methods of automatic detection for depression mostly rely on unimodal data. Clinical research shows that patients with depression have specificity in speech, text, expression, and other modal data. Multimodal machine learning is emerging but not yet widely used for the detection of psychiatric disorders. The problem of existing multimodal detection models is that only global or local information is considered in feature fusion, which leads to the low accuracy of the depression detection model. Therefore, this study constructs an automatic detection model based on multimodal machine learning for adolescent depression. The proposed method first extracted four features from audio and text globally and locally; then construct a coarse-grained fusion model and fine-grained fusion model base on these four features; and fuse the coarse-grained and the fine-grained fusion model finally. Experiments on the real-world dataset demonstrate that the proposed method could improve the accuracy of depression detection automatically.

Keywords: Depression detection model; attention mechanism; deep learning; multimodal data fusion.

MeSH terms

  • Adolescent
  • Depression* / diagnosis
  • Humans
  • Machine Learning*