Automated depression analysis using convolutional neural networks from speech

J Biomed Inform. 2018 Jul:83:103-111. doi: 10.1016/j.jbi.2018.05.007. Epub 2018 May 29.

Abstract

To help clinicians to efficiently diagnose the severity of a person's depression, the affective computing community and the artificial intelligence field have shown a growing interest in designing automated systems. The speech features have useful information for the diagnosis of depression. However, manually designing and domain knowledge are still important for the selection of the feature, which makes the process labor consuming and subjective. In recent years, deep-learned features based on neural networks have shown superior performance to hand-crafted features in various areas. In this paper, to overcome the difficulties mentioned above, we propose a combination of hand-crafted and deep-learned features which can effectively measure the severity of depression from speech. In the proposed method, Deep Convolutional Neural Networks (DCNN) are firstly built to learn deep-learned features from spectrograms and raw speech waveforms. Then we manually extract the state-of-the-art texture descriptors named median robust extended local binary patterns (MRELBP) from spectrograms. To capture the complementary information within the hand-crafted features and deep-learned features, we propose joint fine-tuning layers to combine the raw and spectrogram DCNN to boost the depression recognition performance. Moreover, to address the problems with small samples, a data augmentation method was proposed. Experiments conducted on AVEC2013 and AVEC2014 depression databases show that our approach is robust and effective for the diagnosis of depression when compared to state-of-the-art audio-based methods.

Keywords: Automatic diagnosis; Depression; Median Robust extended Local Binary Patterns(MRELBP); Speech processing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Deep Learning*
  • Depression / diagnosis*
  • Diagnosis, Computer-Assisted*
  • Humans
  • Neural Networks, Computer*
  • Speech*