Face-mask-aware Facial Expression Recognition based on Face Parsing and Vision Transformer

Bo Yang; Jianming Wu; Kazushi Ikeda; Gen Hattori; Masaru Sugano; Yusuke Iwasawa; Yutaka Matsuo

doi:10.1016/j.patrec.2022.11.004

Face-mask-aware Facial Expression Recognition based on Face Parsing and Vision Transformer

Pattern Recognit Lett. 2022 Dec:164:173-182. doi: 10.1016/j.patrec.2022.11.004. Epub 2022 Nov 9.

Authors

Bo Yang^{1

2}, Jianming Wu¹, Kazushi Ikeda¹, Gen Hattori¹, Masaru Sugano¹, Yusuke Iwasawa², Yutaka Matsuo²

Affiliations

¹ KDDI Research, Inc., 2-1-15 Ohara, Fujimino-shi, Saitama, 356-8502, Japan.
² The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8654 Japan.

Abstract

As wearing face masks is becoming an embedded practice due to the COVID-19 pandemic, facial expression recognition (FER) that takes face masks into account is now a problem that needs to be solved. In this paper, we propose a face parsing and vision Transformer-based method to improve the accuracy of face-mask-aware FER. First, in order to improve the precision of distinguishing the unobstructed facial region as well as those parts of the face covered by a mask, we re-train a face-mask-aware face parsing model, based on the existing face parsing dataset automatically relabeled with a face mask and pixel label. Second, we propose a vision Transformer with a cross attention mechanism-based FER classifier, capable of taking both occluded and non-occluded facial regions into account and reweigh these two parts automatically to get the best facial expression recognition performance. The proposed method outperforms existing state-of-the-art face-mask-aware FER methods, as well as other occlusion-aware FER methods, on two datasets that contain three kinds of emotions (M-LFW-FER and M-KDDI-FER datasets) and two datasets that contain seven kinds of emotions (M-FER-2013 and M-CK+ datasets).

Keywords: 41A05; 41A10; 65D05; 65D17; Covid-19; Deep learning; Face mask; Face parsing; Facial expression recognition; Vision transformer.