Detecting emotional valence using time-domain analysis of speech signals

Annu Int Conf IEEE Eng Med Biol Soc. 2019 Jul:2019:3605-3608. doi: 10.1109/EMBC.2019.8857691.

Abstract

Mental health is a growing concern and its problems range from inability to cope with day-to-day stress to severe conditions like depression. Ability to detect these symptoms heavily relies on accurate measurements of emotion and its components, such as emotional valence comprising of positive, negative and neutral affect. Speech as a bio-signal to measure valence is interesting because of the ubiquity of smartphones that can easily record and process speech signals. Speech-based emotion detection uses a broad spectrum of features derived from audio samples including pitch, energy, Mel Frequency Cepstral Coefficients (MFCCs), Linear Predictive Cepstral Coefficients, Log frequency power coefficients, spectrograms and so on. Despite the array of features and classifiers, detecting valence from speech alone remains a challenge. Further, the algorithms for extracting some of these features are computeintensive. This becomes a problem particularly in smartphone applications where the algorithms have to be executed on the device itself. We propose a novel time-domain feature that not only improves the valence detection accuracy, but also saves 10% of the computational cost of extraction as compared to that of MFCCs. A Random Forest Regressor operating on the proposed feature-set detects speaker-independent valence on a non-acted database with 70% accuracy. The algorithm also achieves 100% accuracy when tested with the acted speech database, Emo-DB.

MeSH terms

  • Algorithms*
  • Databases, Factual
  • Emotions*
  • Humans
  • Speech*