Simulated clinical deployment of fully automatic deep learning for clinical prostate MRI assessment

Eur Radiol. 2021 Jan;31(1):302-313. doi: 10.1007/s00330-020-07086-z. Epub 2020 Aug 7.

Abstract

Objectives: To simulate clinical deployment, evaluate performance, and establish quality assurance of a deep learning algorithm (U-Net) for detection, localization, and segmentation of clinically significant prostate cancer (sPC), ISUP grade group ≥ 2, using bi-parametric MRI.

Methods: In 2017, 284 consecutive men in active surveillance, biopsy-naïve or pre-biopsied, received targeted and extended systematic MRI/transrectal US-fusion biopsy, after examination on a single MRI scanner (3 T). A prospective adjustment scheme was evaluated comparing the performance of the Prostate Imaging Reporting and Data System (PI-RADS) and U-Net using sensitivity, specificity, predictive values, and the Dice coefficient.

Results: In the 259 eligible men (median 64 [IQR 61-72] years), PI-RADS had a sensitivity of 98% [106/108]/84% [91/108] with a specificity of 17% [25/151]/58% [88/151], for thresholds at ≥ 3/≥ 4 respectively. U-Net using dynamic threshold adjustment had a sensitivity of 99% [107/108]/83% [90/108] (p > 0.99/> 0.99) with a specificity of 24% [36/151]/55% [83/151] (p > 0.99/> 0.99) for probability thresholds d3 and d4 emulating PI-RADS ≥ 3 and ≥ 4 decisions respectively, not statistically different from PI-RADS. Co-occurrence of a radiological PI-RADS ≥ 4 examination and U-Net ≥ d3 assessment significantly improved the positive predictive value from 59 to 63% (p = 0.03), on a per-patient basis.

Conclusions: U-Net has similar performance to PI-RADS in simulated continued clinical use. Regular quality assurance should be implemented to ensure desired performance.

Key points: • U-Net maintained similar diagnostic performance compared to radiological assessment of PI-RADS ≥ 4 when applied in a simulated clinical deployment. • Application of our proposed prospective dynamic calibration method successfully adjusted U-Net performance within acceptable limits of the PI-RADS reference over time, while not being limited to PI-RADS as a reference. • Simultaneous detection by U-Net and radiological assessment significantly improved the positive predictive value on a per-patient and per-lesion basis, while the negative predictive value remained unchanged.

Keywords: Artificial intelligence; Decision support systems, clinical; Deep learning; Magnetic resonance imaging; Prostate cancer.

MeSH terms

  • Deep Learning*
  • Humans
  • Image-Guided Biopsy
  • Magnetic Resonance Imaging
  • Male
  • Prospective Studies
  • Prostatic Neoplasms* / diagnostic imaging