Benchmarking Deep Learning Models for Tooth Structure Segmentation

L Schneider; L Arsiwala-Scheppach; J Krois; H Meyer-Lueckel; K K Bressem; S M Niehues; F Schwendicke

doi:10.1177/00220345221100169

Benchmarking Deep Learning Models for Tooth Structure Segmentation

J Dent Res. 2022 Oct;101(11):1343-1349. doi: 10.1177/00220345221100169. Epub 2022 Jun 9.

Authors

L Schneider^{1

2}, L Arsiwala-Scheppach^{1

2}, J Krois^{1

2}, H Meyer-Lueckel³, K K Bressem^{4

5}, S M Niehues⁴, F Schwendicke^{1

2}

Affiliations

¹ Department of Oral Diagnostics, Digital Health and Health Services Research, Charité-Universitätsmedizin, Berlin, Germany.
² ITU/WHO Focus Group on AI for Health, Topic Group Dental Diagnostics and Digital Dentistry, Geneva, Switzerland.
³ Department of Restorative, Preventive and Pediatric Dentistry, Zahnmedizinische Kliniken der Universität Bern, University of Bern, Bern, Switzerland.
⁴ Charité-Universitätsmedizin Berlin, Klinik für Radiologie, Berlin, Germany.
⁵ Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany.

Abstract

A wide range of deep learning (DL) architectures with varying depths are available, with developers usually choosing one or a few of them for their specific task in a nonsystematic way. Benchmarking (i.e., the systematic comparison of state-of-the art architectures on a specific task) may provide guidance in the model development process and may allow developers to make better decisions. However, comprehensive benchmarking has not been performed in dentistry yet. We aimed to benchmark a range of architecture designs for 1 specific, exemplary case: tooth structure segmentation on dental bitewing radiographs. We built 72 models for tooth structure (enamel, dentin, pulp, fillings, crowns) segmentation by combining 6 different DL network architectures (U-Net, U-Net++, Feature Pyramid Networks, LinkNet, Pyramid Scene Parsing Network, Mask Attention Network) with 12 encoders from 3 different encoder families (ResNet, VGG, DenseNet) of varying depth (e.g., VGG13, VGG16, VGG19). On each model design, 3 initialization strategies (ImageNet, CheXpert, random initialization) were applied, resulting overall into 216 trained models, which were trained up to 200 epochs with the Adam optimizer (learning rate = 0.0001) and a batch size of 32. Our data set consisted of 1,625 human-annotated dental bitewing radiographs. We used a 5-fold cross-validation scheme and quantified model performances primarily by the F1-score. Initialization with ImageNet or CheXpert weights significantly outperformed random initialization (P < 0.05). Deeper and more complex models did not necessarily perform better than less complex alternatives. VGG-based models were more robust across model configurations, while more complex models (e.g., from the ResNet family) achieved peak performances. In conclusion, initializing models with pretrained weights may be recommended when training models for dental radiographic analysis. Less complex model architectures may be competitive alternatives if computational resources and training time are restricting factors. Models developed and found superior on nondental data sets may not show this behavior for dental domain-specific tasks.

Keywords: artificial intelligence; computer vision; neural networks; segmentation; tooth structures; transfer learning.

MeSH terms

Benchmarking
Deep Learning*
Humans
Image Processing, Computer-Assisted / methods
Neural Networks, Computer
Tooth*

Grants and funding

001/WHO_/World Health Organization/International