Benign vs malignant vertebral compression fractures with MRI: a comparison between automatic deep learning network and radiologist's assessment

Eur Radiol. 2023 Jul;33(7):5060-5068. doi: 10.1007/s00330-023-09713-x. Epub 2023 May 10.

Abstract

Objective: To test the diagnostic performance of a deep-learning Two-Stream Compare and Contrast Network (TSCCN) model for differentiating benign and malignant vertebral compression fractures (VCFs) based on MRI.

Methods: We tested a deep-learning system in 123 benign and 86 malignant VCFs. The median sagittal T1-weighted images (T1WI), T2-weighted images with fat suppression (T2WI-FS), and a combination of both (thereafter, T1WI/T2WI-FS) were used to validate TSCCN. The receiver operator characteristic (ROC) curve was analyzed to evaluate the performance of TSCCN. The accuracy, sensitivity, and specificity of TSCCN in differentiating benign and malignant VCFs were calculated and compared with radiologists' assessments. Intraclass correlation coefficients (ICCs) were tested to find intra- and inter-observer agreement of radiologists in differentiating malignant from benign VCFs.

Results: The AUC of the ROC plots of TSCCN according to T1WI, T2WI-FS, and T1WI/T2WI-FS images were 99.2%, 91.7%, and 98.2%, respectively. The accuracy of T1W, T2WI-FS, and T1W/T2WI-FS based on TSCCN was 95.2%, 90.4%, and 96.2%, respectively, greater than that achieved by radiologists. Further, the specificity of T1W, T2WI-FS, and T1W/T2WI-FS based on TSCCN was higher at 98.4%, 94.3%, and 99.2% than that achieved by radiologists. The intra- and inter-observer agreements of radiologists were 0.79-0.85 and 0.79-0.80 for T1WI, 0.65-0.72 and 0.70-0.74 for T2WI-FS, and 0.83-0.88 and 0.83-0.84 for T1WI/T2WI-FS.

Conclusion: The TSCCN model showed better diagnostic performance than radiologists for automatically identifying benign or malignant VCFs, and is a potentially helpful tool for future clinical application.

Clinical relevance statement: TSCCN-assisted MRI has shown superior performance in distinguishing benign and malignant vertebral compression fractures compared to radiologists. This technology has the value to enhance diagnostic accuracy, sensitivity, and specificity. Further integration into clinical practice is required to optimize patient management.

Key points: • The Two-Stream Compare and Contrast Network (TSCCN) model showed better diagnostic performance than radiologists for identifying benign vs malignant vertebral compression fractures. • The processing of TSCCN is fast and stable, better than the subjective evaluation by radiologists in diagnosing vertebral compression fractures. • The TSCCN model provides options for developing a fully automated, streamlined artificial intelligence diagnostic tool.

Keywords: Deep learning; Fractures, compression; Magnetic resonance imaging; Spine.

MeSH terms

  • Artificial Intelligence
  • Bone Diseases, Metabolic*
  • Deep Learning*
  • Fractures, Compression* / diagnosis
  • Humans
  • Magnetic Resonance Imaging / methods
  • Radiologists
  • Retrospective Studies
  • Spinal Fractures* / diagnostic imaging
  • Spinal Fractures* / pathology