Hierarchical Context-Based Emotion Recognition With Scene Graphs

IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3725-3739. doi: 10.1109/TNNLS.2022.3196831. Epub 2024 Feb 29.

Abstract

For a better intention inference, we often try to figure out the emotional states of other people in social communications. Many studies on affective computing have been carried out to infer emotions through perceiving human states, i.e., facial expression and body posture. Such methods are skillful in a controlled environment. However, it often leads to misestimation due to the deficiency of effective inputs in unconstrained circumstances, that is, where context-aware emotion recognition appeared. We take inspiration from the advanced reasoning pattern of humans in perceived emotion recognition and propose the hierarchical context-based emotion recognition method with scene graphs. We propose to extract three contexts from the image, i.e., the entity context, the global context, and the scene context. The scene context contains abstract information about entity labels and their relationships. It is similar to the information processing of the human visual sensing mechanism. After that, these contexts are further fused to perform emotion recognition. We carried out a bunch of experiments on the widely used context-aware emotion datasets, i.e., CAER-S, EMOTIC, and BOdy Language Dataset (BoLD). We demonstrate that the hierarchical contexts can benefit emotion recognition by improving the accuracy of the SOTA score from 84.82% to 90.83% on CAER-S. The ablation experiments show that hierarchical contexts provide complementary information. Our method improves the F1 score of the SOTA result from 29.33% to 30.24% (C-F1) on EMOTIC. We also build the image-based emotion recognition task with BoLD-Img from BoLD and obtain a better emotion recognition score (ERS) score of 0.2153.

MeSH terms

  • Emotions*
  • Facial Expression
  • Humans
  • Neural Networks, Computer*
  • Recognition, Psychology