Predicting the onset of diabetes-related complications after a diabetes diagnosis with machine learning algorithms

Diabetes Res Clin Pract. 2023 Oct:204:110910. doi: 10.1016/j.diabres.2023.110910. Epub 2023 Sep 16.

Abstract

Aims: Using machine learning algorithms and administrative data, we aimed to predict the risk of being diagnosed with several diabetes-related complications after one-, two- and three-year post-diabetes diagnosis.

Methods: We used longitudinal data from administrative registers of 610,019 individuals in Catalonia with a diagnosis of diabetes and checked the presence of several complications after diabetes onset from 2013 to 2017: hypertension, renal failure, myocardial infarction, cardiovascular disease, retinopathy, congestive heart failure, cerebrovascular disease, peripheral vascular disease and stroke. Four different machine learning (ML) algorithms (logistic regression (LR), Decision tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGB)) will be used to assess their prediction performance and to evaluate the prediction accuracy of complications changes over the period considered.

Results: 610,019 people with diabetes were included. After three years since diabetes diagnosis, the area under the curve values ranged from 60% (retinopathy) to 69% (congestive heart failure), whereas accuracy rates varied between 60% (retinopathy) to 75% (hypertension). RF was the most relevant technique for hypertension, myocardial and retinopathy, and LR for the rest of the comorbidities. The Shapley additive explanations values showed that age was associated with an elevated risk for all diabetes-related complications except retinopathy. Gender, other comorbidities, co-payment levels and age were the most relevant factors for comorbidity diagnosis prediction.

Conclusions: Our ML models allow for the identification of individuals newly diagnosed with diabetes who are at increased risk of developing diabetes-related complications. The prediction performance varied across complications but within acceptable ranges as prediction tools.

Keywords: Administrative data; Deep learning; Diabetes mellitus; Diabetes-related complications; Machine learning.

MeSH terms

  • Algorithms
  • Diabetes Complications* / diagnosis
  • Diabetes Complications* / epidemiology
  • Diabetes Mellitus*
  • Heart Failure* / diagnosis
  • Heart Failure* / epidemiology
  • Heart Failure* / etiology
  • Humans
  • Hypertension*
  • Machine Learning
  • Myocardial Infarction*
  • Retinal Diseases*