External Validation of the Spinal Orthopedic Research Group Index for Spinal Epidural Abscess 90-day Mortality in a Geographically Remote Population

Spine (Phila Pa 1976). 2024 Jan 3. doi: 10.1097/BRS.0000000000004912. Online ahead of print.

Abstract

Study design: Retrospective cohort study.

Objective: To externally validate the Spinal Orthopaedic Research Group (SORG) index for predicting 90-day mortality from Spinal Epidural Abscess (SEA) and compare its utility to the 11-item modified frailty index (mFI-11) and Charlson Comorbidity Index (CCI).

Summary of background data: Providing a mortality estimate may guide informed patient and clinician decision-making. A number of prognostic tools and calculators are available to help predict the risk of mortality from SEA, including the SORG index, which estimates ninety-day post discharge mortality. External validation is essential before wider use of any clinical prediction tool.

Methods: Patients were identified using hospital coding. Medical and radiological records were used to confirm the diagnosis. Mortality data, and data to calculate the SORG index, mFI-11 and CCI was collected. Area under the curve (AUC) and calibration plots were used to analyse.

Results: 150 patients were included: 58 female (39%), with median age 63 years. Fifteen deaths (10%) at 90-days post discharge and 20 (13%) at one-year. The mean SORG index was 13.6%, mean CCI 2.75, and mean mFI-11 was 1.34. The SORG index (P=0.0006) and mFI-11 (P<0.0001) were associated with 90-day mortality. AUC for SORG, mFI-11, and CCI were 0.81, 0.84, and 0.49, respectively. The calibration slope for the SORG index showed slight overestimation in the middle ranges of the predicted probability, more so than mFI-11, and was not well-calibrated over the higher ranges of predicted probability.

Conclusions: This study externally validated the SORG index, demonstrating its utility in our population at predicting 90-day mortality, however, it was less well calibrated than the mFI-11. Variations in algorithm performance may be a result of difference in socioethnic composition and health resource between development and validation centres. Continued multicentre data input may help improve such algorithms and improve their generalisability.