Data-Driven Identification of Long-Term Glycemia Clusters and Their Individualized Predictors in Finnish Patients with Type 2 Diabetes

Clin Epidemiol. 2023 Jan 5:15:13-29. doi: 10.2147/CLEP.S380828. eCollection 2023.

Abstract

Purpose: To gain an understanding of the heterogeneous group of type 2 diabetes (T2D) patients, we aimed to identify patients with the homogenous long-term HbA1c trajectories and to predict the trajectory membership for each patient using explainable machine learning methods and different clinical-, treatment-, and socio-economic-related predictors.

Patients and methods: Electronic health records data covering primary and specialized healthcare on 9631 patients having T2D diagnosis were extracted from the North Karelia region, Finland. Six-year HbA1c trajectories were examined with growth mixture models. Linear discriminant analysis and neural networks were applied to predict the trajectory membership individually.

Results: Three HbA1c trajectories were distinguished over six years: "stable, adequate" (86.5%), "improving, but inadequate" (7.3%), and "fluctuating, inadequate" (6.2%) glycemic control. Prior glucose levels, duration of T2D, use of insulin only, use of insulin together with some oral antidiabetic medications, and use of only metformin were the most important predictors for the long-term treatment balance. The prediction model had a balanced accuracy of 85% and a receiving operating characteristic area under the curve of 91%, indicating high performance. Moreover, the results based on SHAP (Shapley additive explanations) values show that it is possible to explain the outcomes of machine learning methods at the population and individual levels.

Conclusion: Heterogeneity in long-term glycemic control can be predicted with confidence by utilizing information from previous HbA1c levels, fasting plasma glucose, duration of T2D, and use of antidiabetic medications. In future, the expected development of HbA1c could be predicted based on the patient's unique risk factors offering a practical tool for clinicians to support treatment planning.

Keywords: HbA1c; SHAP; cluster; machine learning; type 2 diabetes.

Grants and funding

This study was partly supported by the Finnish Diabetes Association, the Research Committee of the Kuopio University Hospital Catchment Area for the State Research Funding (project QCARE, Joensuu, Finland), the Strategic Research Council at the Academy of Finland (project IMPRO, 312703), and the HTx project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement N° 825162. The funders had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.