How Can the Clinical Aptitude of AI Assistants Be Assayed?

Arun James Thirunavukarasu

doi:10.2196/51603

How Can the Clinical Aptitude of AI Assistants Be Assayed?

J Med Internet Res. 2023 Dec 5:25:e51603. doi: 10.2196/51603.

Author

Arun James Thirunavukarasu^{1

2}

Affiliations

¹ Oxford University Clinical Academic Graduate School, University of Oxford, Oxford, United Kingdom.
² School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom.

PMID: 38051572
PMCID: PMC10731545
DOI: 10.2196/51603

Abstract

Large language models (LLMs) are exhibiting remarkable performance in clinical contexts, with exemplar results ranging from expert-level attainment in medical examination questions to superior accuracy and relevance when responding to patient queries compared to real doctors replying to queries on social media. The deployment of LLMs in conventional health care settings is yet to be reported, and there remains an open question as to what evidence should be required before such deployment is warranted. Early validation studies use unvalidated surrogate variables to represent clinical aptitude, and it may be necessary to conduct prospective randomized controlled trials to justify the use of an LLM for clinical advice or assistance, as potential pitfalls and pain points cannot be exhaustively predicted. This viewpoint states that as LLMs continue to revolutionize the field, there is an opportunity to improve the rigor of artificial intelligence (AI) research to reward innovation, conferring real benefits to real patients.

Keywords: AI; ChatGPT; LLM; artificial general intelligence; artificial intelligence; barrier; barriers; challenge; challenges; chatbot; chatbots; clinical decision aid; conversational agent; conversational agents; foundation models; implementation; language model; large language models; pain point; pain points; pitfall; pitfalls; validation.

©Arun James Thirunavukarasu. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 05.12.2023.

MeSH terms

Aptitude*
Artificial Intelligence*
Clinical Competence*
Humans
Language
Pain
Prospective Studies