Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst

Dang-Linh Trinh; Minh-Cong Vo; Soo-Hyung Kim; Hyung-Jeong Yang; Guee-Sang Lee

doi:10.3390/s23010200

Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst

Sensors (Basel). 2022 Dec 24;23(1):200. doi: 10.3390/s23010200.

Authors

Dang-Linh Trinh¹, Minh-Cong Vo¹, Soo-Hyung Kim¹, Hyung-Jeong Yang¹, Guee-Sang Lee¹

Affiliation

¹ Department of Artificial Intelligence Convergence, Chonnam National University, 77 Yongbong-ro, Gwangju 500-757, Republic of Korea.

Abstract

Speech emotion recognition (SER) is one of the most exciting topics many researchers have recently been involved in. Although much research has been conducted recently on this topic, emotion recognition via non-verbal speech (known as the vocal burst) is still sparse. The vocal burst is concise and has meaningless content, which is harder to deal with than verbal speech. Therefore, in this paper, we proposed a self-relation attention and temporal awareness (SRA-TA) module to tackle this problem with vocal bursts, which could capture the dependency in a long-term period and focus on the salient parts of the audio signal as well. Our proposed method contains three main stages. Firstly, the latent features are extracted using a self-supervised learning model from the raw audio signal and its Mel-spectrogram. After the SRA-TA module is utilized to capture the valuable information from latent features, all features are concatenated and fed into ten individual fully-connected layers to predict the scores of 10 emotions. Our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, which achieves the first ranking of the high-dimensional emotion task in the 2022 ACII Affective Vocal Burst Workshop & Challenge.

Keywords: self-relation attention; self-supervised model; temporal awareness; vocal burst.

MeSH terms

Attention
Emotions*
Speech
Speech Perception*

Abstract

MeSH terms

Grants and funding