Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study

Zohar Elyoseph; Elad Refoua; Kfir Asraf; Maya Lvovsky; Yoav Shimoni; Dorit Hadar-Shoval

doi:10.2196/54369

Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study

JMIR Ment Health. 2024 Feb 6:11:e54369. doi: 10.2196/54369.

Authors

Zohar Elyoseph^#^{1

2}, Elad Refoua³, Kfir Asraf⁴, Maya Lvovsky⁴, Yoav Shimoni⁵, Dorit Hadar-Shoval⁴

Affiliations

¹ Department of Educational Psychology, The Center for Psychobiological Research, The Max Stern Yezreel Valley College, Emek Yezreel, Israel.
² Imperial College London, London, United Kingdom.
³ Department of Psychology, Bar-Ilan University, Ramat Gan, Israel.
⁴ Department of Psychology, The Max Stern Yezreel Valley College, Emek Yezreel, Israel.
⁵ Boston Children's Hospital, Boston, MA, United States.

^# Contributed equally.

PMID: 38319707
PMCID: PMC10879976
DOI: 10.2196/54369

Abstract

Background: Mentalization, which is integral to human cognitive processes, pertains to the interpretation of one's own and others' mental states, including emotions, beliefs, and intentions. With the advent of artificial intelligence (AI) and the prominence of large language models in mental health applications, questions persist about their aptitude in emotional comprehension. The prior iteration of the large language model from OpenAI, ChatGPT-3.5, demonstrated an advanced capacity to interpret emotions from textual data, surpassing human benchmarks. Given the introduction of ChatGPT-4, with its enhanced visual processing capabilities, and considering Google Bard's existing visual functionalities, a rigorous assessment of their proficiency in visual mentalizing is warranted.

Objective: The aim of the research was to critically evaluate the capabilities of ChatGPT-4 and Google Bard with regard to their competence in discerning visual mentalizing indicators as contrasted with their textual-based mentalizing abilities.

Methods: The Reading the Mind in the Eyes Test developed by Baron-Cohen and colleagues was used to assess the models' proficiency in interpreting visual emotional indicators. Simultaneously, the Levels of Emotional Awareness Scale was used to evaluate the large language models' aptitude in textual mentalizing. Collating data from both tests provided a holistic view of the mentalizing capabilities of ChatGPT-4 and Bard.

Results: ChatGPT-4, displaying a pronounced ability in emotion recognition, secured scores of 26 and 27 in 2 distinct evaluations, significantly deviating from a random response paradigm (P<.001). These scores align with established benchmarks from the broader human demographic. Notably, ChatGPT-4 exhibited consistent responses, with no discernible biases pertaining to the sex of the model or the nature of the emotion. In contrast, Google Bard's performance aligned with random response patterns, securing scores of 10 and 12 and rendering further detailed analysis redundant. In the domain of textual analysis, both ChatGPT and Bard surpassed established benchmarks from the general population, with their performances being remarkably congruent.

Conclusions: ChatGPT-4 proved its efficacy in the domain of visual mentalizing, aligning closely with human performance standards. Although both models displayed commendable acumen in textual emotion interpretation, Bard's capabilities in visual emotion interpretation necessitate further scrutiny and potential refinement. This study stresses the criticality of ethical AI development for emotional recognition, highlighting the need for inclusive data, collaboration with patients and mental health experts, and stringent governmental oversight to ensure transparency and protect patient privacy.

Keywords: AI; ChatGPT; LLM; LLMs; RMET; Reading the Mind in the Eyes Test; algorithm; algorithms; artificial intelligence; early detection; early warning; emotional awareness; emotional comprehension; emotional cue; emotional cues; empathy; large language model; large language models; machine learning; mental disease; mental diseases; mental health; mental illness; mental illnesses; mentalization; mentalizing; practical model; practical models; predictive analytics; predictive model; predictive models; predictive system.

©Zohar Elyoseph, Elad Refoua, Kfir Asraf, Maya Lvovsky, Yoav Shimoni, Dorit Hadar-Shoval. Originally published in JMIR Mental Health (https://mental.jmir.org), 06.02.2024.

MeSH terms

Artificial Intelligence*
Benchmarking
Emotions*
Eye
Humans
Pilot Projects