Machine learning-based prediction of vitamin D deficiency: NHANES 2001-2018

Front Endocrinol (Lausanne). 2024 Feb 16:15:1327058. doi: 10.3389/fendo.2024.1327058. eCollection 2024.

Abstract

Background: Vitamin D deficiency is strongly associated with the development of several diseases. In the current context of a global pandemic of vitamin D deficiency, it is critical to identify people at high risk of vitamin D deficiency. There are no prediction tools for predicting the risk of vitamin D deficiency in the general community population, and this study aims to use machine learning to predict the risk of vitamin D deficiency using data that can be obtained through simple interviews in the community.

Methods: The National Health and Nutrition Examination Survey 2001-2018 dataset is used for the analysis which is randomly divided into training and validation sets in the ratio of 70:30. GBM, LR, NNet, RF, SVM, XGBoost methods are used to construct the models and their performance is evaluated. The best performed model was interpreted using the SHAP value and further development of the online web calculator.

Results: There were 62,919 participants enrolled in the study, and all participants included in the study were 2 years old and above, of which 20,204 (32.1%) participants had vitamin D deficiency. The models constructed by each method were evaluated using AUC as the primary evaluation statistic and ACC, PPV, NPV, SEN, SPE, F1 score, MCC, Kappa, and Brier score as secondary evaluation statistics. Finally, the XGBoost-based model has the best and near-perfect performance. The summary plot of SHAP values shows that the top three important features for this model are race, age, and BMI. An online web calculator based on this model can easily and quickly predict the risk of vitamin D deficiency.

Conclusion: In this study, the XGBoost-based prediction tool performs flawlessly and is highly accurate in predicting the risk of vitamin D deficiency in community populations.

Keywords: clinical decision rules; machine learning; nutrition surveys; public health; vitamin D deficiency.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Child, Preschool
  • Humans
  • Machine Learning*
  • Nutrition Surveys
  • Pandemics
  • Vitamin D Deficiency* / epidemiology

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. JG and YL was funded by the “Postgraduate Innovation Research and Practice Program of Anhui Medical University” (No. YJS20230090).