Large datasets from Electronic Health Records predict seizures after ischemic strokes: A Machine Learning approach

Alain Lekoubou; Justin Petucci; Temitope Femi Ajala; Avnish Katoch; Souvik Sen; Vasant Honavar

doi:10.1101/2024.01.24.24301755

Large datasets from Electronic Health Records predict seizures after ischemic strokes: A Machine Learning approach

medRxiv [Preprint]. 2024 Jan 26:2024.01.24.24301755. doi: 10.1101/2024.01.24.24301755.

Authors

Alain Lekoubou¹, Justin Petucci^{2

3}, Temitope Femi Ajala⁴, Avnish Katoch³, Souvik Sen⁵, Vasant Honavar^{2

3

6

7

8}

Affiliations

¹ Department of Neurology, Milton S. Hershey Medical Center and Department of Public Health, Pennsylvania State University.
² Institute for Computational and Data Sciences.
³ Clinical and Translational Sciences Institute.
⁴ Alabama Department of Public Health.
⁵ University of South Carolina, Department of Neurology.
⁶ Data Sciences Program.
⁷ College of Information Sciences and Technology.
⁸ Center for Artificial Intelligence Foundations and Scientific Applications.

Abstract

Objective: To develop an artificial intelligence, machine learning prediction model for estimating the risk of seizures 1 year and 5 years after ischemic stroke (IS) using a large dataset from Electronic Health Records.

Background: Seizures are frequent after ischemic strokes and are associated with increased mortality, poor functional outcomes, and lower quality of life. Separating patients at high risk of seizures from those at low risk of seizures is needed for treatment and clinical trial planning, but remains challenging. Machine learning (ML) is a potential approach to solve this paradigm.

Design/methods: We identified patients (aged ≥18 years) with IS without a prior diagnosis of seizures from 2015 until inception (08/09/22) in the TriNetX Research Network, using the International Classification of Diseases, Tenth Revision (ICD-10) I63, excluding I63.6 (venous infarction). The outcome of interest was any ICD-10 diagnosis of seizures (G40/G41) at 1 year and 5 years following the index IS. We applied a conventional logistic regression and a Light Gradient Boosted Machine algorithm to predict the risk of seizures at 1 year and 5 years. The performance of the model was assessed using the area under the receiver operating characteristics (AUROC), the area under the precision-recall curve (AUPRC), F1 statistic, model accuracy, balanced accuracy, precision, and recall, with and without anti-seizure medication use in the models.

Results: Our study cohort included 430,254 IS patients. Seizures were present in 18,502 (4.3%) and (5.3%) patients within 1 and 5 years after IS, respectively. At 1-year, the AUROC, AUPRC, F1 statistic, accuracy, balanced-accuracy, precision, and recall were respectively 0.7854 (standard error: 0.0038), 0.2426 (0.0048), 0.2299 (0.0034), 0.8236 (0.001), 0.7226 (0.0049), 0.1415 (0.0021), and 0.6122, (0.0095). Corresponding metrics at 5 years were 0.7607 (0.0031), 0.247 (0.0064), 0.2441 (0.0032), 0.8125 (0.0013), 0.7001 (0.0045), 0.155 (0.002) and 0.5745 (0.0095).

Conclusion: Our findings suggest that ML models show good model performance for predicting seizures after IS.

Keywords: Ischemic stroke; TriNetX; machine learning; prediction; seizures.

Publication types

Preprint

Grants and funding

UL1 TR002014/TR/NCATS NIH HHS/United States