Prediction Models of Early Childhood Caries Based on Machine Learning Algorithms

Int J Environ Res Public Health. 2021 Aug 15;18(16):8613. doi: 10.3390/ijerph18168613.

Abstract

In this study, we developed machine learning-based prediction models for early childhood caries and compared their performances with the traditional regression model. We analyzed the data of 4195 children aged 1-5 years from the Korea National Health and Nutrition Examination Survey data (2007-2018). Moreover, we developed prediction models using the XGBoost (version 1.3.1), random forest, and LightGBM (version 3.1.1) algorithms in addition to logistic regression. Two different methods were applied for variable selection, including a regression-based backward elimination and a random forest-based permutation importance classifier. We compared the area under the receiver operating characteristic (AUROC) values and misclassification rates of the different models and observed that all four prediction models had AUROC values ranging between 0.774 and 0.785. Furthermore, no significant difference was observed between the AUROC values of the four models. Based on the results, we can confirm that both traditional logistic regression and ML-based models can show favorable performance and can be used to predict early childhood caries, identify ECC high-risk groups, and implement active preventive treatments. However, further research is essential to improving the performance of the prediction model using recent methods, such as deep learning.

Keywords: Korea National Health and Nutrition Survey; early childhood caries; machine learning; prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Child
  • Child, Preschool
  • Dental Caries Susceptibility*
  • Humans
  • Logistic Models
  • Machine Learning*
  • Nutrition Surveys