Inductive reasoning with large language models: a simulated randomized controlled trial for epilepsy

Daniel M Goldenholz; Shira R Goldenholz; Sara Habib; M Brandon Westover

doi:10.1101/2024.03.18.24304493

Inductive reasoning with large language models: a simulated randomized controlled trial for epilepsy

medRxiv [Preprint]. 2024 Mar 19:2024.03.18.24304493. doi: 10.1101/2024.03.18.24304493.

Authors

Daniel M Goldenholz^{1

2}, Shira R Goldenholz², Sara Habib^{1

2}, M Brandon Westover^{1

2}

Affiliations

¹ Department of Neurology, Harvard Medical School, Boston USA.
² Department of Neurology, Beth Israel Deaconess Medical Center, Boston USA.

Abstract

Importance: The analysis of electronic medical records at scale to learn from clinical experience is currently very challenging. The integration of artificial intelligence (AI), specifically foundational large language models (LLMs), into an analysis pipeline may overcome some of the current limitations of modest input sizes, inaccuracies, biases, and incomplete knowledge bases.

Objective: To explore the effectiveness of using an LLM for generating realistic clinical data and other LLMs for summarizing and synthesizing information in a model system, simulating a randomized clinical trial (RCT) in epilepsy to demonstrate the potential of inductive reasoning via medical chart review.

Design: An LLM-generated simulated RCT based on a RCT for treatment with an antiseizure medication, cenobamate, including a placebo arm and a full-strength drug arm, evaluated by an LLM-based pipeline versus a human reader.

Setting: Simulation based on realistic seizure diaries, treatment effects, reported symptoms and clinical notes generated by LLMs with multiple different neurologist writing styles.

Participants: Simulated cohort of 240 patients, divided 1:1 into placebo and drug arms.

Intervention: Utilization of LLMs for the generation of clinical notes and for the synthesis of data from these notes, aiming to evaluate the efficacy and safety of cenobamate in seizure control either with a human evaluator or AI-pipeline.

Measures: The AI and human analysis focused on identifying the number of seizures, symptom reports, and treatment efficacy, with statistical analysis comparing the 50%-responder rate and median percentage change between the placebo and drug arms, as well as side effect rates in each arm.

Results: AI closely mirrored human analysis, demonstrating the drug's efficacy with marginal differences (<3%) in identifying both drug efficacy and reported symptoms.

Conclusions and relevance: This study showcases the potential of LLMs accurately simulate and analyze clinical trials. Significantly, it highlights the ability of LLMs to reconstruct essential trial elements, identify treatment effects, and recognize reported symptoms, within a realistic clinical framework. The findings underscore the relevance of LLMs in future clinical research, offering a scalable, efficient alternative to traditional data mining methods without the need for specialized medical language training.

Keywords: artificial intelligence; epilepsy; large language models; randomized clinical trials.

Publication types

Preprint

Grants and funding

K23 NS124656/NS/NINDS NIH HHS/United States