A Deep Learning Pipeline for Grade Groups Classification Using Digitized Prostate Biopsy Specimens

Sensors (Basel). 2021 Oct 9;21(20):6708. doi: 10.3390/s21206708.

Abstract

Prostate cancer is a significant cause of morbidity and mortality in the USA. In this paper, we develop a computer-aided diagnostic (CAD) system for automated grade groups (GG) classification using digitized prostate biopsy specimens (PBSs). Our CAD system aims to firstly classify the Gleason pattern (GP), and then identifies the Gleason score (GS) and GG. The GP classification pipeline is based on a pyramidal deep learning system that utilizes three convolution neural networks (CNN) to produce both patch- and pixel-wise classifications. The analysis starts with sequential preprocessing steps that include a histogram equalization step to adjust intensity values, followed by a PBSs' edge enhancement. The digitized PBSs are then divided into overlapping patches with the three sizes: 100 × 100 (CNNS), 150 × 150 (CNNM), and 200 × 200 (CNNL), pixels, and 75% overlap. Those three sizes of patches represent the three pyramidal levels. This pyramidal technique allows us to extract rich information, such as that the larger patches give more global information, while the small patches provide local details. After that, the patch-wise technique assigns each overlapped patch a label as GP categories (1 to 5). Then, the majority voting is the core approach for getting the pixel-wise classification that is used to get a single label for each overlapped pixel. The results after applying those techniques are three images of the same size as the original, and each pixel has a single label. We utilized the majority voting technique again on those three images to obtain only one. The proposed framework is trained, validated, and tested on 608 whole slide images (WSIs) of the digitized PBSs. The overall diagnostic accuracy is evaluated using several metrics: precision, recall, F1-score, accuracy, macro-averaged, and weighted-averaged. The (CNNL) has the best accuracy results for patch classification among the three CNNs, and its classification accuracy is 0.76. The macro-averaged and weighted-average metrics are found to be around 0.70-0.77. For GG, our CAD results are about 80% for precision, and between 60% to 80% for recall and F1-score, respectively. Also, it is around 94% for accuracy and NPV. To highlight our CAD systems' results, we used the standard ResNet50 and VGG-16 to compare our CNN's patch-wise classification results. As well, we compared the GG's results with that of the previous work.

Keywords: CAD system; classification; deep learning; grade groups; prostate cancer.

MeSH terms

  • Biopsy
  • Deep Learning*
  • Humans
  • Male
  • Neoplasm Grading
  • Neural Networks, Computer
  • Prostate* / diagnostic imaging