Automated Detection of Cervical Spinal Stenosis and Cord Compression via Vision Transformer and Rules-Based Classification

AJNR Am J Neuroradiol. 2024 Feb 15. doi: 10.3174/ajnr.A8141. Online ahead of print.

Abstract

Background and purpose: Cervical spinal cord compression, defined as spinal cord deformity and severe narrowing of the spinal canal in the cervical region, can lead to severe clinical consequences, including intractable pain, sensory disturbance, paralysis, and even death, and may require emergent intervention to prevent negative outcomes. Despite the critical nature of cord compression, no automated tool is available to alert clinical radiologists to the presence of such findings. This study aims to demonstrate the ability of a vision transformer (ViT) model for the accurate detection of cervical cord compression.

Materials and methods: A clinically diverse cohort of 142 cervical spine MRIs was identified, 34% of which were normal or had mild stenosis, 31% with moderate stenosis, and 35% with cord compression. Utilizing gradient-echo images, slices were labeled as no cord compression/mild stenosis, moderate stenosis, or severe stenosis/cord compression. Segmentation of the spinal canal was performed and confirmed by neuroradiology faculty. A pretrained ViT model was fine-tuned to predict section-level severity by using a train:validation:test split of 60:20:20. Each examination was assigned an overall severity based on the highest level of section severity, with an examination labeled as positive for cord compression if ≥1 section was predicted in the severe category. Additionally, 2 convolutional neural network (CNN) models (ResNet50, DenseNet121) were tested in the same manner.

Results: The ViT model outperformed both CNN models at the section level, achieving section-level accuracy of 82%, compared with 72% and 78% for ResNet and DenseNet121, respectively. ViT patient-level classification achieved accuracy of 93%, sensitivity of 0.90, positive predictive value of 0.90, specificity of 0.95, and negative predictive value of 0.95. Receiver operating characteristic area under the curve was greater for ViT than either CNN.

Conclusions: This classification approach using a ViT model and rules-based classification accurately detects the presence of cervical spinal cord compression at the patient level. In this study, the ViT model outperformed both conventional CNN approaches at the section and patient levels. If implemented into the clinical setting, such a tool may streamline neuroradiology workflow, improving efficiency and consistency.