Impact of Interobserver Variability in Manual Segmentation of Non-Small Cell Lung Cancer (NSCLC) Applying Low-Rank Radiomic Representation on Computed Tomography

Michelle Hershman; Bardia Yousefi; Lacey Serletti; Maya Galperin-Aizenberg; Leonid Roshkovan; José Marcio Luna; Jeffrey C Thompson; Charu Aggarwal; Erica L Carpenter; Despina Kontos; Sharyn I Katz

doi:10.3390/cancers13235985

Impact of Interobserver Variability in Manual Segmentation of Non-Small Cell Lung Cancer (NSCLC) Applying Low-Rank Radiomic Representation on Computed Tomography

Cancers (Basel). 2021 Nov 28;13(23):5985. doi: 10.3390/cancers13235985.

Authors

Affiliations

¹ Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104, USA.
² Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA 19104, USA.
³ Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
⁴ Section of Interventional Pulmonology, Department of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
⁵ Division of Hematology and Oncology, Department of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

Abstract

This study tackles interobserver variability with respect to specialty training in manual segmentation of non-small cell lung cancer (NSCLC). Four readers included for segmentation are: a data scientist (BY), a medical student (LS), a radiology trainee (MH), and a specialty-trained radiologist (SK) for a total of 293 patients from two publicly available databases. Sørensen-Dice (SD) coefficients and low rank Pearson correlation coefficients (CC) of 429 radiomics were calculated to assess interobserver variability. Cox proportional hazard (CPH) models and Kaplan-Meier (KM) curves of overall survival (OS) prediction for each dataset were also generated. SD and CC for segmentations demonstrated high similarities, yielding, SD: 0.79 and CC: 0.92 (BY-SK), SD: 0.81 and CC: 0.83 (LS-SK), and SD: 0.84 and CC: 0.91 (MH-SK) in average for both databases, respectively. OS through the maximal CPH model for the two datasets yielded c-statistics of 0.7 (95% CI) and 0.69 (95% CI), while adding radiomic and clinical variables (sex, stage/morphological status, and histology) together. KM curves also showed significant discrimination between high- and low-risk patients (p-value < 0.005). This supports that readers' level of training and clinical experience may not significantly influence the ability to extract accurate radiomic features for NSCLC on CT. This potentially allows flexibility in the training required to produce robust prognostic imaging biomarkers for potential clinical translation.

Keywords: computed tomography (CT); interobserver variability; non-small cell lung cancer; radiomics.

Abstract

Grants and funding