Siamese Architecture-Based 3D DenseNet with Person-Specific Normalization Using Neutral Expression for Spontaneous and Posed Smile Classification

Sensors (Basel). 2020 Dec 15;20(24):7184. doi: 10.3390/s20247184.

Abstract

Clinical studies have demonstrated that spontaneous and posed smiles have spatiotemporal differences in facial muscle movements, such as laterally asymmetric movements, which use different facial muscles. In this study, a model was developed in which video classification of the two types of smile was performed using a 3D convolutional neural network (CNN) applying a Siamese network, and using a neutral expression as reference input. The proposed model makes the following contributions. First, the developed model solves the problem caused by the differences in appearance between individuals, because it learns the spatiotemporal differences between the neutral expression of an individual and spontaneous and posed smiles. Second, using a neutral expression as an anchor improves the model accuracy, when compared to that of the conventional method using genuine and imposter pairs. Third, by using a neutral expression as an anchor image, it is possible to develop a fully automated classification system for spontaneous and posed smiles. In addition, visualizations were designed for the Siamese architecture-based 3D CNN to analyze the accuracy improvement, and to compare the proposed and conventional methods through feature analysis, using principal component analysis (PCA).

Keywords: 3D CNN; affective computing; automatic facial expression analysis; deep metric learning; smile classification.

MeSH terms

  • Facial Expression*
  • Facial Muscles
  • Humans
  • Neural Networks, Computer
  • Principal Component Analysis
  • Smiling*