Development and Validation of a Deep Learning Based Diabetes Prediction System Using a Nationwide Population-Based Cohort

Sang Youl Rhee; Ji Min Sung; Sunhee Kim; In-Jeong Cho; Sang-Eun Lee; Hyuk-Jae Chang

doi:10.4093/dmj.2020.0081

Development and Validation of a Deep Learning Based Diabetes Prediction System Using a Nationwide Population-Based Cohort

Diabetes Metab J. 2021 Jul;45(4):515-525. doi: 10.4093/dmj.2020.0081. Epub 2021 Feb 25.

Authors

Sang Youl Rhee^#¹, Ji Min Sung^#², Sunhee Kim³, In-Jeong Cho⁴, Sang-Eun Lee⁵, Hyuk-Jae Chang⁵

Affiliations

¹ Department of Endocrinology and Metabolism, Kyung Hee University School of Medicine, Seoul, Korea.
² Integrative Research Center for Cerebrovascular and Cardiovascular diseases, Yonsei University Health System, Yonsei University College of Medicine, Seoul, Korea.
³ Yonsei University College of Medicine, Yonsei University Health System, Seoul, Korea.
⁴ Division of Cardiology, Ewha Womans University School of Medicine, Seoul, Korea.
⁵ Division of Cardiology, Severance Cardiovascular Hospital, Yonsei University Health System, Yonsei University College of Medicine, Seoul, Korea.

^# Contributed equally.

Abstract

Background: Previously developed prediction models for type 2 diabetes mellitus (T2DM) have limited performance. We developed a deep learning (DL) based model using a cohort representative of the Korean population.

Methods: This study was conducted on the basis of the National Health Insurance Service-Health Screening (NHIS-HEALS) cohort of Korea. Overall, 335,302 subjects without T2DM at baseline were included. We developed the model based on 80% of the subjects, and verified the power in the remainder. Predictive models for T2DM were constructed using the recurrent neural network long short-term memory (RNN-LSTM) network and the Cox longitudinal summary model. The performance of both models over a 10-year period was compared using a time dependent area under the curve.

Results: During a mean follow-up of 10.4±1.7 years, the mean frequency of periodic health check-ups was 2.9±1.0 per subject. During the observation period, T2DM was newly observed in 8.7% of the subjects. The annual performance of the model created using the RNN-LSTM network was superior to that of the Cox model, and the risk factors for T2DM, derived using the two models were similar; however, certain results differed.

Conclusion: The DL-based T2DM prediction model, constructed using a cohort representative of the population, performs better than the conventional model. After pilot tests, this model will be provided to all Korean national health screening recipients in the future.

Keywords: Diabetes mellitus, type 2; Mass screening; Prediabetic state; Prediction.

MeSH terms

Cohort Studies
Deep Learning*
Diabetes Mellitus, Type 2* / diagnosis
Diabetes Mellitus, Type 2* / epidemiology
Humans
Neural Networks, Computer
Risk Factors