Accuracy and Variability of Prostate Multiparametric Magnetic Resonance Imaging Interpretation Using the Prostate Imaging Reporting and Data System: A Blinded Comparison of Radiologists

Nicholas A Pickersgill; Joel M Vetter; Gerald L Andriole; Anup S Shetty; Kathryn J Fowler; Aaron J Mintz; Cary L Siegel; Eric H Kim

doi:10.1016/j.euf.2018.10.008

Accuracy and Variability of Prostate Multiparametric Magnetic Resonance Imaging Interpretation Using the Prostate Imaging Reporting and Data System: A Blinded Comparison of Radiologists

Eur Urol Focus. 2020 Mar 15;6(2):267-272. doi: 10.1016/j.euf.2018.10.008. Epub 2018 Oct 14.

Authors

Nicholas A Pickersgill¹, Joel M Vetter¹, Gerald L Andriole¹, Anup S Shetty², Kathryn J Fowler², Aaron J Mintz², Cary L Siegel², Eric H Kim³

Affiliations

¹ Division of Urology, Washington University School of Medicine, St. Louis, MO, USA.
² Department of Radiology, Washington University School of Medicine, St. Louis, MO, USA.
³ Division of Urology, Washington University School of Medicine, St. Louis, MO, USA. Electronic address: ehkim@wustl.edu.

PMID: 30327280
DOI: 10.1016/j.euf.2018.10.008

Abstract

Background: Multiparametric (mp) magnetic resonance imaging (MRI) has become an important tool for the detection of clinically significant prostate cancer. However, diagnostic accuracy is affected by variability between radiologists.

Objective: To determine the accuracy and variability in prostate mpMRI interpretation among radiologists, both individually and in teams, in a blinded fashion.

Design, setting, and participants: A study cohort (n=32) was created from our prospective registry of patients who received prostate mpMRI with subsequent biopsy. The cohort was then independently reviewed by four radiologists of varying levels of experience, who assigned a Prostate Imaging Reporting and Data System (PI-RADS) classification, blinded to all clinical information. Consensus interpretation by teams of two radiologists was evaluated after a 12-wk wash-out period. Interpretive accuracy was calculated with various cutoffs for PI-RADS classification and Gleason score. Variability among individual radiologists and teams was calculated using the Fleiss kappa and intraclass correlation coefficient (ICC).

Results and limitations: Using PI-RADS 3+/Gleason 7+ (p<0.01) and PI-RADS 4+/Gleason 6+ (p=0.02) as cutoffs, significant differences in accuracy among the four radiologists were noted. At no cutoff for PI-RADS classification or Gleason score did a team read achieve higher accuracy than the most accurate radiologist. The kappa and ICC ranged from 0.22 to 0.29 for the individuals and from 0.16 to 0.21 for the teams (poor agreement). A larger sample size may be needed to adequately power differences in accuracy among individual radiologists.

Conclusions: At various cutoffs for PI-RADS classification and Gleason score, we find significant differences in individual radiologist accuracy, as well as a poor agreement among individual radiologists. Consensus interpretations-as teams of two radiologists-did not improve accuracy or reduce variability.

Patient summary: This study investigated radiologist variability and differences in accuracy using multiparametric magnetic resonance imaging for the diagnosis of prostate cancer. Despite attempts to standardize interpretation within the field, we found substantial variability and significant differences in accuracy among individual radiologists.

Keywords: Accuracy; Biopsy; Diagnosis; Magnetic resonance imaging; Prostate cancer; Radiologist variability.

Publication types

Comparative Study

MeSH terms

Aged
Cohort Studies
Data Systems
Humans
Male
Middle Aged
Multiparametric Magnetic Resonance Imaging*
Observer Variation
Prostatic Neoplasms / classification
Prostatic Neoplasms / diagnostic imaging*
Prostatic Neoplasms / pathology
Radiology
Reproducibility of Results