Application of Machine Learning Algorithms to Predict Uncontrolled Diabetes Using the All of Us Research Program Data

Healthcare (Basel). 2023 Apr 15;11(8):1138. doi: 10.3390/healthcare11081138.

Abstract

There is a paucity of predictive models for uncontrolled diabetes mellitus. The present study applied different machine learning algorithms on multiple patient characteristics to predict uncontrolled diabetes. Patients with diabetes above the age of 18 from the All of Us Research Program were included. Random forest, extreme gradient boost, logistic regression, and weighted ensemble model algorithms were employed. Patients who had a record of uncontrolled diabetes based on the international classification of diseases code were identified as cases. A set of features including basic demographic, biomarkers and hematological indices were included in the model. The random forest model demonstrated high performance in predicting uncontrolled diabetes, yielding an accuracy of 0.80 (95% CI: 0.79-0.81) as compared to the extreme gradient boost 0.74 (95% CI: 0.73-0.75), the logistic regression 0.64 (95% CI: 0.63-0.65) and the weighted ensemble model 0.77 (95% CI: 0.76-0.79). The maximum area under the receiver characteristics curve value was 0.77 (random forest model), while the minimum value was 0.7 (logistic regression model). Potassium levels, body weight, aspartate aminotransferase, height, and heart rate were important predictors of uncontrolled diabetes. The random forest model demonstrated a high performance in predicting uncontrolled diabetes. Serum electrolytes and physical measurements were important features in predicting uncontrolled diabetes. Machine learning techniques may be used to predict uncontrolled diabetes by incorporating these clinical characteristics.

Keywords: All of Us Research Program; machine learning; prediction; serum electrolytes; uncontrolled diabetes.

Grants and funding

This research received no external funding.