Use of machine learning techniques for identifying ischemic stroke instead of the rule-based methods: a nationwide population-based study

Eur J Med Res. 2024 Jan 3;29(1):6. doi: 10.1186/s40001-023-01594-6.

Abstract

Background: Many studies have evaluated stroke using claims data; most of these studies have defined ischemic stroke using an operational definition following the rule-based method. Rule-based methods tend to overestimate the number of patients with ischemic stroke.

Objectives: We aimed to identify an appropriate algorithm for identifying stroke by applying machine learning (ML) techniques to analyze the claims data.

Methods: We obtained the data from the Korean National Health Insurance Service database, which is linked to the Ilsan Hospital database (n = 30,897). The performance of prediction models (extreme gradient boosting [XGBoost] or gated recurrent unit [GRU]) was evaluated using the area under the receiver operating characteristic curve (AUROC), the area under precision-recall curve (AUPRC), and calibration curve.

Results: In total, 30,897 patients were enrolled in this study, 3145 of whom (10.18%) had ischemic stroke. XGBoost, a tree-based ML technique, had the AUROC was 94.46% and AUPRC was 92.80%. GRU showed the highest accuracy (99.81%), precision (99.92%) and recall (99.69%).

Conclusions: We proposed recurrent neural network-based deep learning techniques to improve stroke phenotyping. This can be expected to produce rapid and more accurate results than the rule-based methods.

Keywords: Deep learning; Insurance claim analysis; Ischemic stroke; Machine learning; Phenotyping.

MeSH terms

  • Algorithms
  • Area Under Curve
  • Humans
  • Ischemic Stroke*
  • Machine Learning
  • Stroke* / diagnosis
  • Stroke* / epidemiology