Impact of prevalence and case distribution in lab-based diagnostic imaging studies

Brandon D Gallas; Weijie Chen; Elodia Cole; Robert Ochs; Nicholas Petrick; Etta D Pisano; Berkman Sahiner; Frank W Samuelson; Kyle J Myers

doi:10.1117/1.JMI.6.1.015501

Impact of prevalence and case distribution in lab-based diagnostic imaging studies

J Med Imaging (Bellingham). 2019 Jan;6(1):015501. doi: 10.1117/1.JMI.6.1.015501. Epub 2019 Jan 21.

Authors

Brandon D Gallas¹, Weijie Chen¹, Elodia Cole², Robert Ochs³, Nicholas Petrick¹, Etta D Pisano², Berkman Sahiner¹, Frank W Samuelson¹, Kyle J Myers¹

Affiliations

¹ FDA/CDRH/OSEL/Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, Maryland, United States.
² Medical University of South Carolina, Charleston, South Carolina, United States.
³ FDA/CDRH/OIR/Division of Radiological Health, Silver Spring, Maryland, United States.

Abstract

We investigated effects of prevalence and case distribution on radiologist diagnostic performance as measured by area under the receiver operating characteristic curve (AUC) and sensitivity-specificity in lab-based reader studies evaluating imaging devices. Our retrospective reader studies compared full-field digital mammography (FFDM) to screen-film mammography (SFM) for women with dense breasts. Mammograms were acquired from the prospective Digital Mammographic Imaging Screening Trial. We performed five reader studies that differed in terms of cancer prevalence and the distribution of noncancers. Twenty radiologists participated in each reader study. Using split-plot study designs, we collected recall decisions and multilevel scores from the radiologists for calculating sensitivity, specificity, and AUC. Differences in reader-averaged AUCs slightly favored SFM over FFDM (biggest AUC difference: 0.047, $SE = 0.023$ , $p = 0.047$ ), where standard error accounts for reader and case variability. The differences were not significant at a level of 0.01 (0.05/5 reader studies). The differences in sensitivities and specificities were also indeterminate. Prevalence had little effect on AUC (largest difference: 0.02), whereas sensitivity increased and specificity decreased as prevalence increased. We found that AUC is robust to changes in prevalence, while radiologists were more aggressive with recall decisions as prevalence increased.

Keywords: area under the receiver operating characteristic curve; image evaluation; multiple-reader, multiple-case analysis; sensitivity; specificity; study design.