A multi-stage dynamical fusion network for multimodal emotion recognition

Sihan Chen; Jiajia Tang; Li Zhu; Wanzeng Kong

doi:10.1007/s11571-022-09851-w

A multi-stage dynamical fusion network for multimodal emotion recognition

Cogn Neurodyn. 2023 Jun;17(3):671-680. doi: 10.1007/s11571-022-09851-w. Epub 2022 Jul 31.

Authors

Sihan Chen¹, Jiajia Tang², Li Zhu², Wanzeng Kong²

Affiliations

¹ HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou, China.
² The College of Computer Science, Hangzhou Dianzi University, Hangzhou, China.

PMID: 37265659
PMCID: PMC10229484 (available on 2024-06-01)
DOI: 10.1007/s11571-022-09851-w

Abstract

In recent years, emotion recognition using physiological signals has become a popular research topic. Physiological signal can reflect the real emotional state for individual which is widely applied to emotion recognition. Multimodal signals provide more discriminative information compared with single modal which arose the interest of related researchers. However, current studies on multimodal emotion recognition normally adopt one-stage fusion method which results in the overlook of cross-modal interaction. To solve this problem, we proposed a multi-stage multimodal dynamical fusion network (MSMDFN). Through the MSMDFN, the joint representation based on cross-modal correlation is obtained. Initially, the latent and essential interactions among various features extracted independently from multiple modalities are explored based on specific manner. Subsequently, the multi-stage fusion network is designed to split the fusion procedure into multi-stages using the correlation observed before. This allows us to exploit much more fine-grained unimodal, bimodal and trimodal intercorrelations. For evaluation, the MSMDFN was verified on multimodal benchmark DEAP. The experiments indicate that our method outperforms the related one-stage multi-modal emotion recognition works.

Keywords: Emotion recognition; Multi-stage fusion; Multimodal dynamic fusion; Physiological signals.