Multi-Modality Emotion Recognition Model with GAT-Based Multi-Head Inter-Modality Attention

Changzeng Fu; Chaoran Liu; Carlos Toshinori Ishi; Hiroshi Ishiguro

doi:10.3390/s20174894

Multi-Modality Emotion Recognition Model with GAT-Based Multi-Head Inter-Modality Attention

Sensors (Basel). 2020 Aug 29;20(17):4894. doi: 10.3390/s20174894.

Authors

Changzeng Fu^{1

2}, Chaoran Liu¹, Carlos Toshinori Ishi^{1

3}, Hiroshi Ishiguro^{1

2}

Affiliations

¹ Advanced Telecommunications Research Institute International, Kyoto 619-0288, Japan.
² Graduate School of Engineering Science, Osaka University, Osaka 560-8531, Japan;.
³ Interactive Robot Research Team, Robotics Project, RIKEN, Kyoto 619-0288, Japan.

Abstract

Emotion recognition has been gaining attention in recent years due to its applications on artificial agents. To achieve a good performance with this task, much research has been conducted on the multi-modality emotion recognition model for leveraging the different strengths of each modality. However, a research question remains: what exactly is the most appropriate way to fuse the information from different modalities? In this paper, we proposed audio sample augmentation and an emotion-oriented encoder-decoder to improve the performance of emotion recognition and discussed an inter-modality, decision-level fusion method based on a graph attention network (GAT). Compared to the baseline, our model improved the weighted average F1-scores from 64.18 to 68.31% and the weighted average accuracy from 65.25 to 69.88%.

Keywords: emotion recognition; graph attention network; multi-modality.

MeSH terms

Emotions*
Humans
Pattern Recognition, Automated*