A machine learning-based diagnosis modelling of type 2 diabetes mellitus with environmental metal exposure

Comput Methods Programs Biomed. 2023 Jun:235:107537. doi: 10.1016/j.cmpb.2023.107537. Epub 2023 Apr 5.

Abstract

Background and objective: Increasing and compelling evidence has been proved that urinary and dietary metal exposure are underappreciated but potentially modifiable biomarkers for type 2 diabetes mellitus (T2DM). The aims of this study were (1) to identify the key potential biomarkers which contributed to T2DM with effective and parsimonious features and (2) to assess the utility of baseline variables and metal exposure in the diagnosis of T2DM.

Methods: Based on the National Health and Nutrition Examination Survey (NHANES), we selected 9822 screening records with 82 significant variables covering demographics, lifestyle, anthropometric measures, diet and metal exposure for this study. Combining extreme gradient boosting (XGBoost), random forest and light gradient boosting machine (lightGBM), a soft voting ensemble model was proposed to measure the importance of 82 features. With this soft voting ensemble model and variance inflation factor (VIF), strong multicollinear features with low importance scores were further removed from candidate biomarkers. Then, a soft voting ensemble classifier was adopted to demonstrate the efficiency of the proposed feature selection method.

Results: With the novel feature selection method, 12 baseline variables and 3 metal variables were selected to detect patients at risk for T2DM in our study. For metal variables, the dietary copper (Cu), urinary cadmium (Cd) and urinary mercury (Hg) metals were selected as the most remarkable metal exposure and the corresponding P-values were all less than 0.05. In a classification model of T2DM with 12 baseline biomarkers, the addition of 3 metal exposure improved the classification accuracy of T2DM from a traditional area under the curve (AUC) 0.792 of the receiver operating characteristic (ROC) to an AUC 0.847.

Conclusions: This was the first demonstration of T2DM classification with machine learning under urinary and dietary metal exposure. Improved prediction precision illustrated the effectiveness of the proposed machine learning-based diagnosis model facilitated lifestyle/dietary intervention for T2DM prevention.

Keywords: Environmental metal exposure; Machine learning; Statistical analysis; Type 2 diabetes mellitus.

MeSH terms

  • Biomarkers
  • Diabetes Mellitus, Type 2* / diagnosis
  • Humans
  • Machine Learning
  • Nutrition Surveys

Substances

  • Biomarkers