Data-driven assessment, contextualisation and implementation of 134 variables in the risk for type 2 diabetes: an analysis of Lifelines, a prospective cohort study in the Netherlands

Diabetologia. 2021 Jun;64(6):1268-1278. doi: 10.1007/s00125-021-05419-1. Epub 2021 Mar 12.

Abstract

Aims/hypothesis: We aimed to assess and contextualise 134 potential risk variables for the development of type 2 diabetes and to determine their applicability in risk prediction.

Methods: A total of 96,534 people without baseline diabetes (372,007 person-years) from the Dutch Lifelines cohort were included. We used a risk variable-wide association study (RV-WAS) design to independently screen and replicate risk variables for 5-year incidence of type 2 diabetes. For identified variables, we contextualised HRs, calculated correlations and assessed their robustness and unique contribution in different clinical contexts using bootstrapped and cross-validated lasso regression models. We evaluated the change in risk, or 'HR trajectory', when sequentially assigning variables to a model.

Results: We identified 63 risk variables, with novel associations for quality-of-life indicators and non-cardiovascular medications (i.e., proton-pump inhibitors, anti-asthmatics). For continuous variables, the increase of 1 SD of HbA1c, i.e., 3.39 mmol/mol (0.31%), was equivalent in risk to an increase of 0.53 mmol/l of glucose, 19.8 cm of waist circumference, 8.34 kg/m2 of BMI, 0.67 mmol/l of HDL-cholesterol, and 0.14 mmol/l of uric acid. Other variables required an increase of >3 SD, which is not physiologically realistic or a rare occurrence in the population. Though moderately correlated, the inclusion of four variables satiated prediction models. Invasive variables, except for glucose and HbA1c, contributed little compared with non-invasive variables. Glucose, HbA1c and family history of diabetes explained a unique part of disease risk. Adding risk variables to a satiated model can impact the HRs of variables already in the model.

Conclusions: Many variables show weak or inconsistent associations with the development of type 2 diabetes, and only a handful can reliably explain disease risk. Newly discovered risk variables will yield little over established factors, and existing prediction models can be simplified. A systematic, data-driven approach to identify risk variables for the prediction of type 2 diabetes is necessary for the practice of precision medicine.

Keywords: Contextualisation; Data-driven; Identification; Lasso regression; Machine learning; Prediction models; Prospective; Risk variable-wide association study; Type 2 diabetes.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Diabetes Mellitus, Type 2 / epidemiology*
  • Female
  • Humans
  • Hyperglycemia / epidemiology*
  • Incidence
  • Male
  • Middle Aged
  • Netherlands / epidemiology
  • Prediabetic State / epidemiology*
  • Prospective Studies
  • Risk
  • Risk Assessment