Tree-Based Risk Factor Identification and Stroke Level Prediction in Stroke Cohort Study

Biomed Res Int. 2023 Apr 10:2023:7352191. doi: 10.1155/2023/7352191. eCollection 2023.

Abstract

Objective. This study focuses on the identification of risk factors, classification of stroke level, and evaluation of the importance and interactions of various patient characteristics using cohort data from the Second Hospital of Lanzhou University. Methodology. Risk factors are identified by evaluation of the relationships between factors and response, as well as by ranking the importance of characteristics. Then, after discarding negligible factors, some well-known multicategorical classification algorithms are used to predict the level of stroke. In addition, using the Shapley additive explanation method (SHAP), factors with positive and negative effects are identified, and some important interactions for classifying the level of stroke are proposed. A waterfall plot for a specific patient is presented and used to determine the risk degree of that patient. Results and Conclusion. The results show that (1) the most important risk factors for stroke are hypertension, history of transient ischemia, and history of stroke; age and gender have a negligible impact. (2) The XGBoost model shows the best performance in predicting stroke risk; it also gives a ranking of risk factors based on their impact. (3) A combination of SHAP and XGBoost can be used to identify positive and negative factors and their interactions in stroke prediction, thereby providing helpful guidance for diagnosis.

MeSH terms

  • Algorithms*
  • Cohort Studies
  • Hospitals
  • Humans
  • Risk Factors
  • Stroke* / diagnosis
  • Stroke* / epidemiology