Micro Expression Recognition via Dual-Stream Spatiotemporal Attention Network

Yan Wang; Yikun Huang; Can Liu; Xiaoying Gu; Dandan Yang; Shuopeng Wang; Bo Zhang

doi:10.1155/2021/7799100

Micro Expression Recognition via Dual-Stream Spatiotemporal Attention Network

J Healthc Eng. 2021 Aug 17:2021:7799100. doi: 10.1155/2021/7799100. eCollection 2021.

Authors

Yan Wang¹, Yikun Huang², Can Liu³, Xiaoying Gu¹, Dandan Yang¹, Shuopeng Wang¹, Bo Zhang¹

Affiliations

¹ College of Information Engineering, Tianjin University of Commerce, Tianjin 300134, China.
² Concord University College of Fujian Normal University, Fuzhou, Fujian 350117, China.
³ School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China.

Abstract

Microexpression can manifest the real mood of humans, which has been widely concerned in clinical diagnosis and depression analysis. To solve the problem of missing discriminative spatiotemporal features in a small data set caused by the short duration and subtle movement changes of microexpression, we present a dual-stream spatiotemporal attention network (DSTAN) that integrates dual-stream spatiotemporal network and attention mechanism to capture the deformation features and spatiotemporal features of microexpression in the case of small samples. The Spatiotemporal networks in DSTAN are based on two lightweight networks, namely, the spatiotemporal appearance network (STAN) learning the appearance features from the microexpression sequences and the spatiotemporal motion network (STMN) learning the motion features from optical flow sequences. To focus on the discriminative motion areas of microexpression, we construct a novel attention mechanism for the spatial model of STAN and STMN, including a multiscale kernel spatial attention mechanism and global dual-pool channel attention mechanism. To obtain the importance of each frame in the microexpression sequence, we design a temporal attention mechanism for the temporal model of STAN and STMN to form spatiotemporal appearance network-attention (STAN-A) and spatiotemporal motion network-attention (STMN-A), which can adaptively perform dynamic feature refinement. Finally, the feature concatenate-SVM method is used to integrate STAN-A and STMN-A to a novel network, DSTAN. The extensive experiments on three small spontaneous microexpression data sets of SMIC, CASME, and CASME II demonstrate the proposed DSTAN can effectively cope with the recognition of microexpressions.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Humans
Motion
Movement*