Multimodal Sentiment Analysis Representations Learning via Contrastive Learning with Condense Attention Fusion

Huiru Wang; Xiuhong Li; Zenyu Ren; Min Wang; Chunming Ma

doi:10.3390/s23052679

Multimodal Sentiment Analysis Representations Learning via Contrastive Learning with Condense Attention Fusion

Sensors (Basel). 2023 Mar 1;23(5):2679. doi: 10.3390/s23052679.

Authors

Huiru Wang¹, Xiuhong Li¹, Zenyu Ren², Min Wang², Chunming Ma¹

Affiliations

¹ Xinjiang Key Laboratory of Signal Detection and Processing, College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China.
² College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China.

Abstract

Multimodal sentiment analysis has gained popularity as a research field for its ability to predict users' emotional tendencies more comprehensively. The data fusion module is a critical component of multimodal sentiment analysis, as it allows for integrating information from multiple modalities. However, it is challenging to combine modalities and remove redundant information effectively. In our research, we address these challenges by proposing a multimodal sentiment analysis model based on supervised contrastive learning, which leads to more effective data representation and richer multimodal features. Specifically, we introduce the MLFC module, which utilizes a convolutional neural network (CNN) and Transformer to solve the redundancy problem of each modal feature and reduce irrelevant information. Moreover, our model employs supervised contrastive learning to enhance its ability to learn standard sentiment features from data. We evaluate our model on three widely-used datasets, namely MVSA-single, MVSA-multiple, and HFM, demonstrating that our model outperforms the state-of-the-art model. Finally, we conduct ablation experiments to validate the efficacy of our proposed method.

Keywords: MLFC; SCSupCon; multimodal; multimodal sentiment analysis; supervised contrastive learning.

Abstract

Grants and funding