Context: Conventional prediction models for vitamin D deficiency have limited accuracy.
Background: Using cross-sectional data, we developed models based on machine learning (ML) and compared their performance with those based on a conventional approach.
Methods: Participants were 5106 community-resident adults (50-84 years; 58% male). In the randomly sampled training set (65%), we constructed 5 ML models: lasso regression, elastic net regression, random forest, gradient boosted decision tree, and dense neural network. The reference model was a logistic regression model. Outcomes were deseasonalized serum 25-hydroxyvitamin D (25(OH)D) <50 nmol/L (yes/no) and <25 nmol/L (yes/no). In the test set (the remaining 35%), we evaluated predictive performance of each model, including area under the receiver operating characteristic curve (AUC) and net benefit (decision curves).
Results: Overall, 1270 (25%) and 91 (2%) had 25(OH)D <50 and <25 nmol/L, respectively. Compared with the reference model, the ML models predicted 25(OH)D <50 nmol/L with similar accuracy. However, for prediction of 25(OH)D <25 nmol/L, all ML models had higher AUC point estimates than the reference model by up to 0.14. AUC was highest for elastic net regression (0.93; 95% CI 0.90-0.96), compared with 0.81 (95% CI 0.71-0.91) for the reference model. In the decision curve analysis, ML models mostly achieved a greater net benefit across a range of thresholds.
Conclusion: Compared with conventional models, ML models predicted 25(OH)D <50 nmol/L with similar accuracy but they predicted 25(OH)D <25 nmol/L with greater accuracy. The latter finding suggests a role for ML models in participant selection for vitamin D supplement trials.
Keywords: Vitamin D; machine learning; prediction; vitamin D deficiency.
© The Author(s) 2022. Published by Oxford University Press on behalf of the Endocrine Society. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.