Development and implementation of patient-level prediction models of end-stage renal disease for type 2 diabetes patients using fast healthcare interoperability resources

San Wang; Jieun Han; Se Young Jung; Tae Jung Oh; Sen Yao; Sanghee Lim; Hee Hwang; Ho-Young Lee; Haeun Lee

doi:10.1038/s41598-022-15036-6

Development and implementation of patient-level prediction models of end-stage renal disease for type 2 diabetes patients using fast healthcare interoperability resources

Sci Rep. 2022 Jul 4;12(1):11232. doi: 10.1038/s41598-022-15036-6.

Authors

San Wang^#¹, Jieun Han^#², Se Young Jung^{3

4}, Tae Jung Oh^{5

6}, Sen Yao¹, Sanghee Lim¹, Hee Hwang⁷, Ho-Young Lee⁷, Haeun Lee⁷

Affiliations

¹ Enolink, Cambridge, USA.
² Department of Family Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea.
³ Department of Family Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea. syjung@snubh.org.
⁴ Department of Digital Healthcare, Seoul National University Bundang Hospital, 172 Dolma-ro, Bundang-gu, Seongnam, 13620, Republic of Korea. syjung@snubh.org.
⁵ Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea. ohtjmd@gmail.com.
⁶ Department of Internal Medicine, Seoul National University College of Medicine and Seoul National University Bundang Hospital, 82 Gumi-ro, Bundang-gu, Seongnam, 13620, Republic of Korea. ohtjmd@gmail.com.
⁷ Department of Digital Healthcare, Seoul National University Bundang Hospital, 172 Dolma-ro, Bundang-gu, Seongnam, 13620, Republic of Korea.

^# Contributed equally.

Abstract

This study aimed to develop a model to predict the 5-year risk of developing end-stage renal disease (ESRD) in patients with type 2 diabetes mellitus (T2DM) using machine learning (ML). It also aimed to implement the developed algorithms into electronic medical records (EMR) system using Health Level Seven (HL7) Fast Healthcare Interoperability Resources (FHIR). The final dataset used for modeling included 19,159 patients. The medical data were engineered to generate various types of features that were input into the various ML classifiers. The classifier with the best performance was XGBoost, with an area under the receiver operator characteristics curve (AUROC) of 0.95 and area under the precision recall curve (AUPRC) of 0.79 using three-fold cross-validation, compared to other models such as logistic regression, random forest, and support vector machine (AUROC range, 0.929-0.943; AUPRC 0.765-0.792). Serum creatinine, serum albumin, the urine albumin-to-creatinine ratio, Charlson comorbidity index, estimated GFR, and medication days of insulin were features that were ranked high for the ESRD risk prediction. The algorithm was implemented in the EMR system using HL7 FHIR through an ML-dedicated server that preprocessed unstructured data and trained updated data.

MeSH terms

Delivery of Health Care*
Diabetes Mellitus, Type 2*
Humans
Kidney Failure, Chronic* / therapy
Logistic Models
Machine Learning