A machine learning-based algorithm to identify U-500R insulin candidates among adults with type 2 diabetes mellitus in US retrospective databases

Curr Med Res Opin. 2024 Mar;40(3):367-375. doi: 10.1080/03007995.2023.2293116. Epub 2024 Jan 23.

Abstract

Objective: To develop a machine learning-based predictive algorithm to identify patients with type 2 diabetes mellitus (T2DM) who are candidates for initiation of U-500R insulin (U-500R).

Methods: A retrospective cohort of patients with T2DM was used from a large US administrative claims and electronic health records (EHR) database affiliated with Optum. Predictor variables derived from the data were used to identify appropriate supervised machine learning models including least absolute shrinkage and selection operator (LASSO) and extreme gradient boosted (XGBoost) methods. Predictive performance was assessed using precision-recall (PR) and receiver operating characteristic (ROC) area under the curve (AUC). The clinical interpretation of the final model was supported by fitting the final set of variables from the LASSO and XGBoost models to a traditional logistic regression model. Model choice was determined by comparing Akaike Information Criterion (AIC), residual deviances, and scaled Brier scores.

Results: Among 81,242 patients who met the study eligibility criteria, 577 initiated U-500R and were assigned to the positive class. Predictors of U-500R initiation included overweight/obesity, neuropathy, HbA1c ≥9% and 8%-9%, BUN 23.8 to <112 mg/dl, ALT 35.9-2056.2 U/L, no radiological chest exams, no GFR labs, and gait/mobility abnormalities. The best performing model was the LASSO model with an ROC AUC of 0.776 on the hold-out test set.

Conclusion: This study successfully developed and validated a machine learning-based algorithm to identify U-500R candidates among patients with T2DM. This may help health care providers and decision-makers to understand important characteristics of patients who could use U-500R therapies which in turn could support policies and guidelines for optimal patient management.

Keywords: High dose insulin; LASSO logistic regression; US administrative claims database; XGBoost; electronic health records; machine learning model; type 2 diabetes.

MeSH terms

  • Adult
  • Algorithms
  • Diabetes Mellitus, Type 2* / drug therapy
  • Humans
  • Insulin / therapeutic use
  • Machine Learning
  • Retrospective Studies

Substances

  • Insulin