Diagnostic Performance and Interobserver Consistency of the Prostate Imaging Reporting and Data System Version 2: A Study on Six Prostate Radiologists with Different Experiences from Half a Year to 17 Years

Chin Med J (Engl). 2018 Jul 20;131(14):1666-1673. doi: 10.4103/0366-6999.235872.

Abstract

Background: One of the main aims of the updated Prostate Imaging Reporting and Data System Version 2 (PI-RADS v2) is to diminish variation in the interpretation and reporting of prostate imaging, especially among readers with varied experience levels. This study aimed to retrospectively analyze diagnostic consistency and accuracy for prostate disease among six radiologists with different experience levels from a single center and to evaluate the diagnostic performance of PI-RADS v2 scores in the detection of clinically significant prostate cancer (PCa).

Methods: From December 2014 to March 2016, 84 PCa patients and 99 benign prostatic shyperplasia patients who underwent 3.0T multiparametric magnetic resonance imaging before biopsy were included in our study. All patients received evaluation according to the PI-RADS v2 scale (1-5 scores) from six blinded readers (with 6 months and 2, 3, 4, 5, or 17 years of experience, respectively, the last reader was a reviewer/contributor for the PI-RADS v2). The correlation among the readers' scores and the Gleason score (GS) was determined with the Kendall test. Intra-/inter-observer agreement was evaluated using κ statistics, while receiver operating characteristic curve and area under the curve analyses were performed to evaluate the diagnostic performance of the scores.

Results: Based on the PI-RADS v2, the median κ score and standard error among all possible pairs of readers were 0.506 and 0.043, respectively; the average correlation between the six readers' scores and the GS was positive, exhibiting weak-to-moderate strength (r = 0.391, P = 0.006). The AUC values of the six radiologists were 0.883, 0.924, 0.927, 0.932, 0.929, and 0.947, respectively.

Conclusion: The inter-reader agreement for the PI-RADS v2 among the six readers with different experience is weak to moderate. Different experience levels affect the interpretation of MRI images.

PI-RADS v2诊断效能对六名不同经验水平半年至17的前列腺影像医师诊断一致性的评价研究摘要背景:最新版的前列腺影像报告和数据系统(PI-RADS v2)的主要目的之一是减少不同影像医师间对前列腺影像解读的差异性,尤其是针对具有不同经验水平的影像医师。本研究旨在回顾性分析6名具有不同经验水平的影像医师在诊断前列腺疾病中一致性和准确性,并评估使用PI-RADS v2检测临床上显著性前列腺癌的诊断效能。 方法:本研究共纳入183例(从2014年12月到2016年3月)在前列腺穿刺活检前均接受了3.0T多参数磁共振(Mp-MRI)检查的患者,其中包括84例前列腺癌(PCa)和99例良性前列腺增生(BPH)。6名具有不同经验水平的影像医师(分别为6个月、2、3、4、5及17年,最后一位曾参与PI-RADS v2撰写和讨论)基于PI-RADS v2对所有患者分别进行评分(1-5分)。采用Kendall相关系数来分析读者评分与Gleason评分(GS)之间的相关性;采用Kappa一致性检验来评估读者内及读者间的一致性;同时采用ROC曲线和曲线下面积(AUC)分析评估不同评分的诊断效能。 结果:在PI-RADS v2的基础上,6名读者间一致性的平均值及标准误分别为0.506和0.043;6名读者的评分与GS间为正相关,平均相关系数为r=0.319,P=0.006,相关程度为弱到中等。6名读者的AUC值分别为0.883,0.924,0.927,0.932,0.929和0.947。 结论:6名具有不同经验水平的影像医师间的平均一致性为弱到中等,因此不同的经验水平对MRI图像的解读具有一定影响。.

Keywords: Benign Prostatic Hyperplasia; Diagnosis; Magnetic Resonance Imaging; Prostate Cancer; Prostate Imaging Reporting and Data System Version 2.

MeSH terms

  • Aged
  • Aged, 80 and over
  • Humans
  • Magnetic Resonance Imaging*
  • Male
  • Middle Aged
  • Neoplasm Grading
  • Prostatic Neoplasms / diagnostic imaging*
  • Retrospective Studies