Cephalometric analysis performance discrepancy between orthodontists and an artificial intelligence model using lateral cephalometric radiographs

J Esthet Restor Dent. 2024 Apr;36(4):555-565. doi: 10.1111/jerd.13156. Epub 2023 Oct 26.

Abstract

Purpose: The purpose of the present clinical study was to compare the Ricketts and Steiner cephalometric analysis obtained by two experienced orthodontists and artificial intelligence (AI)-based software program and measure the orthodontist variability.

Materials and methods: A total of 50 lateral cephalometric radiographs from 50 patients were obtained. Two groups were created depending on the operator performing the cephalometric analysis: orthodontists (Orthod group) and an AI software program (AI group). In the Orthod group, two independent experienced orthodontists performed the measurements by performing a manual identification of the cephalometric landmarks and a software program (NemoCeph; Nemotec) to calculate the measurements. In the AI group, an AI software program (CephX; ORCA Dental AI) was selected for both the automatic landmark identification and cephalometric measurements. The Ricketts and Steiner cephalometric analyses were assessed in both groups including a total of 24 measurements. The Shapiro-Wilk test showed that the data was normally distributed. The t-test was used to analyze the data (α = 0.05).

Results: The t-test analysis showed significant measurement discrepancies between the Orthod and AI group in seven of the 24 cephalometric parameters tested, namely the corpus length (p = 0.003), mandibular arc (p < 0.001), lower face height (p = 0.005), overjet (p = 0.019), and overbite (p = 0.022) in the Ricketts cephalometric analysis and occlusal to SN (p = 0.002) and GoGn-SN (p < 0.001) in the Steiner cephalometric analysis. The intraclass correlation coefficient (ICC) between both orthodontists of the Orthod group for each cephalometric measurement was calculated.

Conclusions: Significant discrepancies were found in seven of the 24 cephalometric measurements tested between the orthodontists and the AI-based program assessed. The intra-operator reliability analysis showed reproducible measurements between both orthodontists, except for the corpus length measurement.

Clinical significance: The artificial intelligence software program tested has the potential to automatically obtain cephalometric analysis using lateral cephalometric radiographs; however, additional studies are needed to further evaluate the accuracy of this AI-based system.

Keywords: artificial intelligence; cephalometric analysis; machine learning; orthodontics.

MeSH terms

  • Artificial Intelligence*
  • Cephalometry
  • Humans
  • Orthodontists*
  • Reproducibility of Results