Development and Validation of a Deep Learning Based Diabetes Prediction System Using a Nationwide Population-Based Cohort

Diabetes Metab J. 2021 Jul;45(4):515-525. doi: 10.4093/dmj.2020.0081. Epub 2021 Feb 25.

Abstract

Background: Previously developed prediction models for type 2 diabetes mellitus (T2DM) have limited performance. We developed a deep learning (DL) based model using a cohort representative of the Korean population.

Methods: This study was conducted on the basis of the National Health Insurance Service-Health Screening (NHIS-HEALS) cohort of Korea. Overall, 335,302 subjects without T2DM at baseline were included. We developed the model based on 80% of the subjects, and verified the power in the remainder. Predictive models for T2DM were constructed using the recurrent neural network long short-term memory (RNN-LSTM) network and the Cox longitudinal summary model. The performance of both models over a 10-year period was compared using a time dependent area under the curve.

Results: During a mean follow-up of 10.4±1.7 years, the mean frequency of periodic health check-ups was 2.9±1.0 per subject. During the observation period, T2DM was newly observed in 8.7% of the subjects. The annual performance of the model created using the RNN-LSTM network was superior to that of the Cox model, and the risk factors for T2DM, derived using the two models were similar; however, certain results differed.

Conclusion: The DL-based T2DM prediction model, constructed using a cohort representative of the population, performs better than the conventional model. After pilot tests, this model will be provided to all Korean national health screening recipients in the future.

Keywords: Diabetes mellitus, type 2; Mass screening; Prediabetic state; Prediction.

MeSH terms

  • Cohort Studies
  • Deep Learning*
  • Diabetes Mellitus, Type 2* / diagnosis
  • Diabetes Mellitus, Type 2* / epidemiology
  • Humans
  • Neural Networks, Computer
  • Risk Factors