Systematic comparison of machine learning algorithms to develop and validate predictive models for periodontitis

Nasir Z Bashir; Zahid Rahman; Sam Li-Sheng Chen

doi:10.1111/jcpe.13692

Systematic comparison of machine learning algorithms to develop and validate predictive models for periodontitis

J Clin Periodontol. 2022 Oct;49(10):958-969. doi: 10.1111/jcpe.13692. Epub 2022 Jul 28.

Authors

Nasir Z Bashir^{1

2}, Zahid Rahman³, Sam Li-Sheng Chen⁴

Affiliations

¹ School of Oral and Dental Sciences, University of Bristol, Bristol, UK.
² School of Mathematics and Statistics, The University of Sheffield, Sheffield, UK.
³ 1859 Capital LLP, London, UK.
⁴ School of Oral Hygiene, College of Oral Medicine, Taipei Medical University, Taipei, Taiwan.

Abstract

Aim: The aim of this study was to compare the validity of different machine learning algorithms to develop and validate predictive models for periodontitis.

Materials and methods: Using national survey data from Taiwan (n = 3453) and the United States (n = 3685), predictors of periodontitis were extracted from the datasets and pre-processed, and then 10 machine learning algorithms were trained to develop predictive models. The models were validated both internally (bootstrap sampling) and externally (alternative country's dataset). The algorithms were compared across six performance metrics ([i] area under the curve for the receiver operating characteristic [AUC], [ii] accuracy, [iii] sensitivity, [iv] specificity, [v] positive predictive value, and [vi] negative predictive value) and two methods of data pre-processing ([i] machine-learning-based feature selection and [ii] dimensionality reduction into principal components).

Results: Many algorithms showed extremely strong performance during internal validation (AUC > 0.95, accuracy > 95%). However, this was not replicated in external validation, where predictive performance of all algorithms dropped off drastically. Furthermore, predictive performance differed according to data pre-processing methodology and the cohort on which they were trained.

Conclusions: Larger sample sizes and more complex predictors of periodontitis are required before machine learning can be leveraged to its full potential.

Keywords: computing; machine learning; periodontitis; predictive modelling; statistics.

MeSH terms

Algorithms
Humans
Machine Learning*
Periodontitis* / diagnosis
Predictive Value of Tests
ROC Curve