Multimodal Attention Dynamic Fusion Network for Facial Micro-Expression Recognition

Hongling Yang; Lun Xie; Hang Pan; Chiqin Li; Zhiliang Wang; Jialiang Zhong

doi:10.3390/e25091246

Multimodal Attention Dynamic Fusion Network for Facial Micro-Expression Recognition

Entropy (Basel). 2023 Aug 22;25(9):1246. doi: 10.3390/e25091246.

Authors

Hongling Yang¹, Lun Xie², Hang Pan¹, Chiqin Li², Zhiliang Wang², Jialiang Zhong³

Affiliations

¹ Department of Computer Science, Changzhi University, Changzhi 046011, China.
² School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China.
³ School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China.

Abstract

The emotional changes in facial micro-expressions are combinations of action units. The researchers have revealed that action units can be used as additional auxiliary data to improve facial micro-expression recognition. Most of the researchers attempt to fuse image features and action unit information. However, these works ignore the impact of action units on the facial image feature extraction process. Therefore, this paper proposes a local detail feature enhancement model based on a multimodal dynamic attention fusion network (MADFN) method for micro-expression recognition. This method uses a masked autoencoder based on learnable class tokens to remove local areas with low emotional expression ability in micro-expression images. Then, we utilize the action unit dynamic fusion module to fuse action unit representation to improve the potential representation ability of image features. The state-of-the-art performance of our proposed model is evaluated and verified on SMIC, CASME II, SAMM, and their combined 3DB-Combined datasets. The experimental results demonstrated that the proposed model achieved competitive performance with accuracy rates of 81.71%, 82.11%, and 77.21% on SMIC, CASME II, and SAMM datasets, respectively, that show the MADFN model can help to improve the discrimination of facial image emotional features.

Keywords: dynamic fusion; learnable class token; micro-expression recognition.

Abstract

Grants and funding