An overview of the performance of AI in fracture detection in lumbar and thoracic spine radiographs on a per vertebra basis

Skeletal Radiol. 2024 Feb 27. doi: 10.1007/s00256-024-04626-2. Online ahead of print.

Abstract

Purpose: Subtle spinal compression fractures can easily be missed. AI may help in interpreting these images. We propose to test the performance of an FDA-approved algorithm for fracture detection in radiographs on a per vertebra basis, assessing performance based on grade of compression, presence of foreign material, severity of degenerative changes, and acuity of the fracture.

Methods: Thoracic and lumbar spine radiographs with inquiries for fracture were retrospectively collected and analyzed by the AI. The presence or absence of fracture was defined by the written report or cross-sectional imaging where available. Fractures were classified semi-quantitatively by the Genant classification, by acuity, by the presence of foreign material, and overall degree of degenerative change of the spine. The results of the AI were compared to the gold standard.

Results: A total of 512 exams were included, depicting 4114 vertebra with 495 fractures. Overall sensitivity was 63.2% for the lumbar spine, significantly higher than the thoracic spine with 50.6%. Specificity was 96.7 and 98.3% respectively. Sensitivity increased with fracture grade, without a significant difference between grade 2 and 3 compression fractures (lumbar spine: grade 1, 52.5%; grade 2, 72.3%; grade 3, 75.8%; thoracic spine: grade 1, 42.4%; grade 2, 60.0%; grade 3, 60.0%). The presence of foreign material and a high degree of degenerative changes reduced sensitivity.

Conclusion: Overall performance of the AI on a per vertebra basis was degraded in clinically relevant scenarios such as for low-grade compression fractures.

Keywords: Artificial intelligence; Computer-aided diagnosis; Radiography; Trauma.