Interrater reliability of the categorization of late radiographic changes after lung stereotactic body radiation therapy

Salman Faruqi; Meredith E Giuliani; Hamid Raziee; Mei Ling Yap; Heidi Roberts; Lisa W Le; Anthony Brade; John Cho; Alexander Sun; Andrea Bezjak; Andrew J Hope

doi:10.1016/j.ijrobp.2014.04.042

Interrater reliability of the categorization of late radiographic changes after lung stereotactic body radiation therapy

Int J Radiat Oncol Biol Phys. 2014 Aug 1;89(5):1076-1083. doi: 10.1016/j.ijrobp.2014.04.042. Epub 2014 Jul 8.

Authors

Affiliations

¹ Department of Radiation Oncology, Princess Margaret Cancer Centre, Toronto, ON, Canada.
² Department of Radiation Oncology, Princess Margaret Cancer Centre, Toronto, ON, Canada. Electronic address: meredith.giuliani@rmp.uhn.on.ca.
³ Department of Radiology, University Health Network, Toronto, Ontario, Canada.
⁴ Department of Biostatistics, Princess Margaret Cancer Centre, Toronto, Ontario, Canada.

PMID: 25035211
DOI: 10.1016/j.ijrobp.2014.04.042

Abstract

Purpose: Radiographic changes after lung stereotactic body radiation therapy (SBRT) have been categorized into 4 groups: modified conventional pattern (A), mass-like fibrosis; (B), scar-like fibrosis (C), and no evidence of increased density (D). The purpose of this study was to assess the interrater reliability of this categorization system in patients with early-stage non-small cell lung cancer (NSCLC).

Methods and materials: Seventy-seven patients were included in this study, all treated with SBRT for early-stage (T1/2) NSCLC at a single institution, with a minimum follow-up of 6 months. Six experienced clinicians familiar with post-SBRT radiographic changes scored the serial posttreatment CT images independently in a blinded fashion. The proportion of patients categorized as A, B, C, or D at each interval was determined. Krippendorff's alpha (KA), Multirater kappa (M-kappa), and Gwet's AC1 (AC1) scores were used to establish interrater reliability. A leave-one-out analysis was performed to demonstrate the variability among raters. Interrater agreement of the first and last 20 patients scored was calculated to explore whether a training effect existed.

Results: The number of ratings ranged from 450 at 6 months to 84 at 48 months of follow-up. The proportion of patients in each category was as follows: A, 45%; B, 16%; C, 13%; and D, 26%. KA and M-kappa ranged from 0.17 to 0.34. AC1 measure range was 0.22 to 0.48. KA increased from 0.24 to 0.36 at 12 months with training. The percent agreement for pattern A peaked at 12 month with a 54% chance of having >50% raters in agreement and decreased over time, whereas that for patterns B and C increased over time to a maximum of 20% and 22%, respectively.

Conclusion: This post-SBRT radiographic change categorization system has modest interrater agreement, and there is a suggestion of a training effect. Patterns of fibrosis evolve after SBRT and alternative categorization systems should be evaluated.

MeSH terms

Aged
Aged, 80 and over
Carcinoma, Non-Small-Cell Lung / pathology
Carcinoma, Non-Small-Cell Lung / surgery*
Female
Humans
Lung / diagnostic imaging*
Lung / radiation effects*
Lung Neoplasms / pathology
Lung Neoplasms / surgery*
Male
Middle Aged
Neoplasm Recurrence, Local / diagnostic imaging
Neoplasm Staging
Observer Variation
Radiation Pneumonitis / diagnostic imaging*
Radiation Pneumonitis / epidemiology
Radiosurgery / adverse effects*
Radiosurgery / methods
Reproducibility of Results
Tomography, X-Ray Computed
Video Recording