Comparing Manual and Machine Annotations of Emotions in Non-acted Speech

Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul:2018:4241-4244. doi: 10.1109/EMBC.2018.8513230.

Abstract

Psychological well-being at the workplace has increased the demand for detecting emotions with higher accuracies. Speech, one of the most non-obtrusive modes of capturing emotions at the workplace, is still in need of robust emotion annotation mechanisms for non-acted speech corpora. In this paper, we extend our experiments on our non-acted speech database in two ways. First, we report how participants themselves perceive the emotion in their voice after a long gap of about six months, and how a third person, who has not heard the clips earlier, perceives the emotion in the same utterances. Both annotators also rated the intensity of the emotion. They agreed better in neutral (84%) and negative clips (74%) than in positive ones (38%). Second, we restrict our attention to those samples that had agreement and show that the classification accuracy of 80% by machine learning, an improvement of 7% over the state-of-the-art results for speakerdependent classification. This result suggests that the high-level perception of emotion does translate to the low-level features of speech. Further analysis shows that the silently expressed positive and negative emotions are often misinterpreted as neutral. For the speaker-independent test set, we report an overall accuracy of 61%.

Publication types

  • Comparative Study

MeSH terms

  • Emotions*
  • Humans
  • Machine Learning
  • Speech Perception*
  • Speech*
  • Voice*