An exploration on the machine-learning-based stroke prediction model

Shenshen Zhi; Xiefei Hu; Yan Ding; Huajian Chen; Xun Li; Yang Tao; Wei Li

doi:10.3389/fneur.2024.1372431

An exploration on the machine-learning-based stroke prediction model

Front Neurol. 2024 Apr 29:15:1372431. doi: 10.3389/fneur.2024.1372431. eCollection 2024.

Authors

Shenshen Zhi¹, Xiefei Hu², Yan Ding³, Huajian Chen³, Xun Li³, Yang Tao⁴, Wei Li³

Affiliations

¹ Department of Blood Transfusion, Chongqing University Central Hospital, School of Medicine, Chongqing University, Chongqing, China.
² Medicine School of Chongqing University, Chongqing, China.
³ Clinical Laboratory, Chongqing Emergency Medical Center, Chongqing University Central Hospital, School of Medicine, Chongqing University, Chongqing, China.
⁴ Intensive Care Unit, Chongqing Emergency Medical Center, Chongqing University Central Hospital, School of Medicine, Chongqing University, Chongqing, China.

Abstract

Introduction: With the rapid development of artificial intelligence technology, machine learning algorithms have been widely applied at various stages of stroke diagnosis, treatment, and prognosis, demonstrating significant potential. A correlation between stroke and cytokine levels in the human body has recently been reported. Our study aimed to establish machine-learning models based on cytokine features to enhance the decision-making capabilities of clinical physicians.

Methods: This study recruited 2346 stroke patients and 2128 healthy control subjects from Chongqing University Central Hospital. A predictive model was established through clinical experiments and collection of clinical laboratory tests and demographic variables at admission. Three classification algorithms, namely Random Forest, Gradient Boosting, and Support Vector Machine, were employed. The models were evaluated using methods such as ROC curves, AUC values, and calibration curves.

Results: Through univariate feature selection, we selected 14 features and constructed three machine-learning models: Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Machine (GBM). Our results indicated that in the training set, the RF model outperformed the GBM and SVM models in terms of both the AUC value and sensitivity. We ranked the features using the RF algorithm, and the results showed that IL-6, IL-5, IL-10, and IL-2 had high importance scores and ranked at the top. In the test set, the stroke model demonstrated a good generalization ability, as evidenced by the ROC curve, confusion matrix, and calibration curve, confirming its reliability as a predictive model for stroke.

Discussion: We focused on utilizing cytokines as features to establish stroke prediction models. Analyses of the ROC curve, confusion matrix, and calibration curve of the test set demonstrated that our models exhibited a strong generalization ability, which could be applied in stroke prediction.

Keywords: cytokines; machine learning; prediction model; random forest model; stroke.

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This study was sponsored by the Chongqing Advanced Medical Talents Program for young and middle-aged individuals (grant number: ZQNYXGDRCGZS2019008).