Electronic Health Record Driven Prediction for Gestational Diabetes Mellitus in Early Pregnancy

Sci Rep. 2017 Nov 27;7(1):16417. doi: 10.1038/s41598-017-16665-y.

Abstract

Gestational diabetes mellitus (GDM) is conventionally confirmed with oral glucose tolerance test (OGTT) in 24 to 28 weeks of gestation, but it is still uncertain whether it can be predicted with secondary use of electronic health records (EHRs) in early pregnancy. To this purpose, the cost-sensitive hybrid model (CSHM) and five conventional machine learning methods are used to construct the predictive models, capturing the future risks of GDM in the temporally aggregated EHRs. The experimental data sources from a nested case-control study cohort, containing 33,935 gestational women in West China Second Hospital. After data cleaning, 4,378 cases and 50 attributes are stored and collected for the data set. Through selecting the most feasible method, the cost parameter of CSHM is adapted to deal with imbalance of the dataset. In the experiment, 3940 samples are used for training and the rest 438 samples for testing. Although the accuracy of positive samples is barely acceptable (62.16%), the results suggest that the vast majority (98.4%) of those predicted positive instances are real positives. To our knowledge, this is the first study to apply machine learning models with EHRs to predict GDM, which will facilitate personalized medicine in maternal health management in the future.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Algorithms
  • Cost-Benefit Analysis
  • Databases, Factual
  • Diabetes, Gestational / diagnosis
  • Diabetes, Gestational / epidemiology*
  • Diabetes, Gestational / etiology
  • Electronic Health Records* / statistics & numerical data
  • Female
  • Gestational Age
  • Humans
  • Models, Statistical
  • Pregnancy
  • Prognosis
  • ROC Curve
  • Workflow