A Hybrid Multimodal Emotion Recognition Framework for UX Evaluation Using Generalized Mixture Functions

Muhammad Asif Razzaq; Jamil Hussain; Jaehun Bang; Cam-Hao Hua; Fahad Ahmed Satti; Ubaid Ur Rehman; Hafiz Syed Muhammad Bilal; Seong Tae Kim; Sungyoung Lee

doi:10.3390/s23094373

A Hybrid Multimodal Emotion Recognition Framework for UX Evaluation Using Generalized Mixture Functions

Sensors (Basel). 2023 Apr 28;23(9):4373. doi: 10.3390/s23094373.

Authors

Muhammad Asif Razzaq^{1

2}, Jamil Hussain³, Jaehun Bang⁴, Cam-Hao Hua², Fahad Ahmed Satti^{2

5}, Ubaid Ur Rehman^{2

5}, Hafiz Syed Muhammad Bilal⁵, Seong Tae Kim², Sungyoung Lee²

Affiliations

¹ Department of Computer Science, Fatima Jinnah Women University, Rawalpindi 46000, Pakistan.
² Ubiquitous Computing Lab, Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si 17104, Republic of Korea.
³ Department of Data Science, Sejong University, Seoul 30019, Republic of Korea.
⁴ Hanwha Corporation/Momentum, Hanwha Building, 86 Cheonggyecheon-ro, Jung-gu, Seoul 04541, Republic of Korea.
⁵ Department of Computing, School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan.

Abstract

Multimodal emotion recognition has gained much traction in the field of affective computing, human-computer interaction (HCI), artificial intelligence (AI), and user experience (UX). There is growing demand to automate analysis of user emotion towards HCI, AI, and UX evaluation applications for providing affective services. Emotions are increasingly being used, obtained through the videos, audio, text or physiological signals. This has led to process emotions from multiple modalities, usually combined through ensemble-based systems with static weights. Due to numerous limitations like missing modality data, inter-class variations, and intra-class similarities, an effective weighting scheme is thus required to improve the aforementioned discrimination between modalities. This article takes into account the importance of difference between multiple modalities and assigns dynamic weights to them by adapting a more efficient combination process with the application of generalized mixture (GM) functions. Therefore, we present a hybrid multimodal emotion recognition (H-MMER) framework using multi-view learning approach for unimodal emotion recognition and introducing multimodal feature fusion level, and decision level fusion using GM functions. In an experimental study, we evaluated the ability of our proposed framework to model a set of four different emotional states (Happiness, Neutral, Sadness, and Anger) and found that most of them can be modeled well with significantly high accuracy using GM functions. The experiment shows that the proposed framework can model emotional states with an average accuracy of 98.19% and indicates significant gain in terms of performance in contrast to traditional approaches. The overall evaluation results indicate that we can identify emotional states with high accuracy and increase the robustness of an emotion classification system required for UX measurement.

Keywords: audio-based emotion recognition; decision fusioning; emotion recognition; feature fusioning; generalized mixture function; user experience.

MeSH terms

Algorithms*
Artificial Intelligence*
Electroencephalography / methods
Emotions / physiology
Humans
Learning
Recognition, Psychology

Grants and funding

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (IITP-2022-0-00078, Explainable Logical Reasoning for Medical Knowledge Generation) and (IITP-2017-0-00655, Lean UX core technology and platform for any digital artifacts UX evaluation) and the Grand Information Technology Research Center support program (IITP-2022-2020-0-01489).