Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment

Oscar N E Kjell; Katarina Kjell; H Andrew Schwartz

doi:10.1016/j.psychres.2023.115667

Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment

Psychiatry Res. 2024 Mar:333:115667. doi: 10.1016/j.psychres.2023.115667. Epub 2023 Dec 10.

Authors

Oscar N E Kjell¹, Katarina Kjell², H Andrew Schwartz³

Affiliations

¹ Psychology Department, Lund University, Sweden; Computer Science Department, Stony Brook University, United States. Electronic address: oscar.kjell@psy.lu.se.
² Psychology Department, Lund University, Sweden.
³ Psychology Department, Lund University, Sweden; Computer Science Department, Stony Brook University, United States.

PMID: 38290286
DOI: 10.1016/j.psychres.2023.115667

Abstract

In this narrative review, we survey recent empirical evaluations of AI-based language assessments and present a case for the technology of large language models to be poised for changing standardized psychological assessment. Artificial intelligence has been undergoing a purported "paradigm shift" initiated by new machine learning models, large language models (e.g., BERT, LAMMA, and that behind ChatGPT). These models have led to unprecedented accuracy over most computerized language processing tasks, from web searches to automatic machine translation and question answering, while their dialogue-based forms, like ChatGPT have captured the interest of over a million users. The success of the large language model is mostly attributed to its capability to numerically represent words in their context, long a weakness of previous attempts to automate psychological assessment from language. While potential applications for automated therapy are beginning to be studied on the heels of chatGPT's success, here we present evidence that suggests, with thorough validation of targeted deployment scenarios, that AI's newest technology can move mental health assessment away from rating scales and to instead use how people naturally communicate, in language.

Keywords: Artificial intelligence; Assessment; Large language models; Psychology; Transformers.

Publication types

Review

MeSH terms

Artificial Intelligence*
Humans
Language*
Machine Learning