Speech Emotion Recognition Applied to Real-World Medical Consultation

Stud Health Technol Inform. 2024 Jan 25:310:1121-1125. doi: 10.3233/SHTI231139.

Abstract

Since 2020, the COVID-19 epidemic has changed our lives in healthcare behaviors. Forced to wear masks influenced doctor-patient interaction perceptions truly, thus, to build a satisfying relationship is not just empathize with facial expressions. The voice becomes more important for the sake of conquering the burden of masks. Hence, verbal and non-verbal communication will be crucial criteria for doctor-patient interaction during medical consultations and other conversations. In these years, speech emotion recognition has been a popular research domain. In spite of abundant work conducted, nonverbal emotion recognition in medical scenarios is still required to reveal. In this study, we investigate YAMNet transfer learning on Chinese Mandarin speech corpus NTHU-NTUA Chinese Interactive Emotion Corpus (NNIME) and use real-world dermatology clinic recording to test the generalization capability. The results showed that the accuracy validated on NNIME data was 0.59 for activation prediction and 0.57 for valence. Furthermore, the validation accuracy on the doctor-patient dataset was 0.24 for activation and 0.58 for valence, respectively.

Keywords: Speech emotion recognition; YAMNet transfer learning; bidirectional long short-term memory networks; doctor-patient communication; medical education.

MeSH terms

  • Emotions
  • Humans
  • Perception
  • Referral and Consultation
  • Speech*
  • Voice*