The Effectiveness of Artificial Intelligence Conversational Agents in Health Care: Systematic Review

Madison Milne-Ives; Caroline de Cock; Ernest Lim; Melissa Harper Shehadeh; Nick de Pennington; Guy Mole; Eduardo Normando; Edward Meinert

doi:10.2196/20346

The Effectiveness of Artificial Intelligence Conversational Agents in Health Care: Systematic Review

J Med Internet Res. 2020 Oct 22;22(10):e20346. doi: 10.2196/20346.

Authors

Madison Milne-Ives¹, Caroline de Cock¹, Ernest Lim^{2

3}, Melissa Harper Shehadeh⁴, Nick de Pennington^{3

5}, Guy Mole^{3

5}, Eduardo Normando², Edward Meinert^{1

6

7}

Affiliations

¹ Digitally Enabled PrevenTative Health Research Group, Department of Paediatrics, University of Oxford, Oxford, United Kingdom.
² Imperial College Healthcare NHS Trust, London, United Kingdom.
³ Ufonia Limited, Oxford, United Kingdom.
⁴ Institute of Global Health, University of Geneva, Geneva, Switzerland.
⁵ Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom.
⁶ Department of Primary Care and Public Health, Imperial College London, London, United Kingdom.
⁷ Centre for Health Technology, University of Plymouth, Plymouth, United Kingdom.

PMID: 33090118
PMCID: PMC7644372
DOI: 10.2196/20346

Abstract

Background: The high demand for health care services and the growing capability of artificial intelligence have led to the development of conversational agents designed to support a variety of health-related activities, including behavior change, treatment support, health monitoring, training, triage, and screening support. Automation of these tasks could free clinicians to focus on more complex work and increase the accessibility to health care services for the public. An overarching assessment of the acceptability, usability, and effectiveness of these agents in health care is needed to collate the evidence so that future development can target areas for improvement and potential for sustainable adoption.

Objective: This systematic review aims to assess the effectiveness and usability of conversational agents in health care and identify the elements that users like and dislike to inform future research and development of these agents.

Methods: PubMed, Medline (Ovid), EMBASE (Excerpta Medica dataBASE), CINAHL (Cumulative Index to Nursing and Allied Health Literature), Web of Science, and the Association for Computing Machinery Digital Library were systematically searched for articles published since 2008 that evaluated unconstrained natural language processing conversational agents used in health care. EndNote (version X9, Clarivate Analytics) reference management software was used for initial screening, and full-text screening was conducted by 1 reviewer. Data were extracted, and the risk of bias was assessed by one reviewer and validated by another.

Results: A total of 31 studies were selected and included a variety of conversational agents, including 14 chatbots (2 of which were voice chatbots), 6 embodied conversational agents (3 of which were interactive voice response calls, virtual patients, and speech recognition screening systems), 1 contextual question-answering agent, and 1 voice recognition triage system. Overall, the evidence reported was mostly positive or mixed. Usability and satisfaction performed well (27/30 and 26/31), and positive or mixed effectiveness was found in three-quarters of the studies (23/30). However, there were several limitations of the agents highlighted in specific qualitative feedback.

Conclusions: The studies generally reported positive or mixed evidence for the effectiveness, usability, and satisfactoriness of the conversational agents investigated, but qualitative user perceptions were more mixed. The quality of many of the studies was limited, and improved study design and reporting are necessary to more accurately evaluate the usefulness of the agents in health care and identify key areas for improvement. Further research should also analyze the cost-effectiveness, privacy, and security of the agents.

International registered report identifier (irrid): RR2-10.2196/16934.

Keywords: artificial intelligence; avatar; chatbot; conversational agent; digital health; intelligent assistant; speech recognition software; virtual assistant; virtual coach; virtual health care; virtual nursing; voice recognition software.

©Madison Milne-Ives, Caroline de Cock, Ernest Lim, Melissa Harper Shehadeh, Nick de Pennington, Guy Mole, Eduardo Normando, Edward Meinert. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 22.10.2020.

Publication types

Research Support, Non-U.S. Gov't
Systematic Review

MeSH terms

Artificial Intelligence / standards*
Communication
Delivery of Health Care
Female
Humans
Male