A Bayesian network model of new-onset diabetes in older Chinese: The Guangzhou biobank cohort study

Front Endocrinol (Lausanne). 2022 Aug 3:13:916851. doi: 10.3389/fendo.2022.916851. eCollection 2022.

Abstract

Background: Existing diabetes risk prediction models based on regression were limited in dealing with collinearity and complex interactions. Bayesian network (BN) model that considers interactions may provide additional information to predict risk and infer causation.

Methods: BN model was constructed for new-onset diabetes using prospective data of 15,934 participants without diabetes at baseline [73% women; mean (standard deviation) age = 61.0 (6.9) years]. Participants were randomly assigned to a training (n = 12,748) set and a validation (n = 3,186) set. Model performances were assessed using area under the receiver operating characteristic curve (AUC).

Results: During an average follow-up of 4.1 (interquartile range = 3.3-4.5) years, 1,302 (8.17%) participants developed diabetes. The constructed BN model showed the associations (direct, indirect, or no) among 24 risk factors, and only hypertension, impaired fasting glucose (IFG; fasting glucose of 5.6-6.9 mmol/L), and greater waist circumference (WC) were directly associated with new-onset diabetes. The risk prediction model showed that the post-test probability of developing diabetes in participants with hypertension, IFG, and greater WC was 27.5%, with AUC of 0.746 [95% confidence interval CI) = 0.732-0.760], sensitivity of 0.727 (95% CI = 0.703-0.752), and specificity of 0.660 (95% CI = 0.652-0.667). This prediction model appeared to perform better than a logistic regression model using the same three predictors (AUC = 0.734, 95% CI = 0.703-0.764, sensitivity = 0.604, and specificity = 0.745).

Conclusions: We have first reported a BN model in predicting new-onset diabetes with the smallest number of factors among existing models in the literature. BN yielded a more comprehensive figure showing graphically the inter-relations for multiple factors with diabetes than existing regression models.

Keywords: Bayesian network; causal model; diabetes; directed acyclic graph; risk factors.

Publication types

  • Randomized Controlled Trial
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Bayes Theorem
  • Biological Specimen Banks
  • China / epidemiology
  • Cohort Studies
  • Diabetes Mellitus* / epidemiology
  • Diabetes Mellitus* / etiology
  • Female
  • Glucose
  • Humans
  • Hypertension*
  • Male
  • Middle Aged
  • Prospective Studies

Substances

  • Glucose