Classification of Cancer at Prostate MRI: Deep Learning versus Clinical PI-RADS Assessment

Patrick Schelb; Simon Kohl; Jan Philipp Radtke; Manuel Wiesenfarth; Philipp Kickingereder; Sebastian Bickelhaupt; Tristan Anselm Kuder; Albrecht Stenzinger; Markus Hohenfellner; Heinz-Peter Schlemmer; Klaus H Maier-Hein; David Bonekamp

doi:10.1148/radiol.2019190938

Classification of Cancer at Prostate MRI: Deep Learning versus Clinical PI-RADS Assessment

Radiology. 2019 Dec;293(3):607-617. doi: 10.1148/radiol.2019190938. Epub 2019 Oct 8.

Affiliation

¹ From the Division of Radiology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany (P.S., J.P.R., P.K., H.P.S., D.B.); Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany (S.K., K.H.M.H.); Department of Urology, University of Heidelberg Medical Center, Heidelberg, Germany (J.P.R., M.H.); Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany (M.W.); Department of Neuroradiology, University of Heidelberg Medical Center, Heidelberg, Germany (P.K.); Junior Group Medical Imaging and Radiology-Cancer Prevention, German Cancer Research Center (DKFZ), Heidelberg, Germany (S.B.); Division of Medical Physics, German Cancer Research Center (DKFZ), Heidelberg, Germany (T.A.K.); Institute of Pathology, University of Heidelberg Medical Center, Heidelberg, Germany (A.S.); and German Cancer Consortium (DKTK), Heidelberg, Germany (H.P.S., K.H.M.H., D.B.).

PMID: 31592731
DOI: 10.1148/radiol.2019190938

Abstract

Background Men suspected of having clinically significant prostate cancer (sPC) increasingly undergo prostate MRI. The potential of deep learning to provide diagnostic support for human interpretation requires further evaluation. Purpose To compare the performance of clinical assessment to a deep learning system optimized for segmentation trained with T2-weighted and diffusion MRI in the task of detection and segmentation of lesions suspicious for sPC. Materials and Methods In this retrospective study, T2-weighted and diffusion prostate MRI sequences from consecutive men examined with a single 3.0-T MRI system between 2015 and 2016 were manually segmented. Ground truth was provided by combined targeted and extended systematic MRI-transrectal US fusion biopsy, with sPC defined as International Society of Urological Pathology Gleason grade group greater than or equal to 2. By using split-sample validation, U-Net was internally validated on the training set (80% of the data) through cross validation and subsequently externally validated on the test set (20% of the data). U-Net-derived sPC probability maps were calibrated by matching sextant-based cross-validation performance to clinical performance of Prostate Imaging Reporting and Data System (PI-RADS). Performance of PI-RADS and U-Net were compared by using sensitivities, specificities, predictive values, and Dice coefficient. Results A total of 312 men (median age, 64 years; interquartile range [IQR], 58-71 years) were evaluated. The training set consisted of 250 men (median age, 64 years; IQR, 58-71 years) and the test set of 62 men (median age, 64 years; IQR, 60-69 years). In the test set, PI-RADS cutoffs greater than or equal to 3 versus cutoffs greater than or equal to 4 on a per-patient basis had sensitivity of 96% (25 of 26) versus 88% (23 of 26) at specificity of 22% (eight of 36) versus 50% (18 of 36). U-Net at probability thresholds of greater than or equal to 0.22 versus greater than or equal to 0.33 had sensitivity of 96% (25 of 26) versus 92% (24 of 26) (both P > .99) with specificity of 31% (11 of 36) versus 47% (17 of 36) (both P > .99), not statistically different from PI-RADS. Dice coefficients were 0.89 for prostate and 0.35 for MRI lesion segmentation. In the test set, coincidence of PI-RADS greater than or equal to 4 with U-Net lesions improved the positive predictive value from 48% (28 of 58) to 67% (24 of 36) for U-Net probability thresholds greater than or equal to 0.33 (P = .01), while the negative predictive value remained unchanged (83% [25 of 30] vs 83% [43 of 52]; P > .99). Conclusion U-Net trained with T2-weighted and diffusion MRI achieves similar performance to clinical Prostate Imaging Reporting and Data System assessment. © RSNA, 2019 Online supplemental material is available for this article. See also the editorial by Padhani and Turkbey in this issue.

MeSH terms

Aged
Biopsy
Deep Learning*
Humans
Magnetic Resonance Imaging*
Male
Middle Aged
Predictive Value of Tests
Prostatic Neoplasms / diagnostic imaging
Prostatic Neoplasms / pathology*
Retrospective Studies
Sensitivity and Specificity