Machine learning approaches to analyze histological images of tissues from radical prostatectomies

Comput Med Imaging Graph. 2015 Dec;46 Pt 2(Pt 2):197-208. doi: 10.1016/j.compmedimag.2015.08.002. Epub 2015 Aug 20.

Abstract

Computerized evaluation of histological preparations of prostate tissues involves identification of tissue components such as stroma (ST), benign/normal epithelium (BN) and prostate cancer (PCa). Image classification approaches have been developed to identify and classify glandular regions in digital images of prostate tissues; however their success has been limited by difficulties in cellular segmentation and tissue heterogeneity. We hypothesized that utilizing image pixels to generate intensity histograms of hematoxylin (H) and eosin (E) stains deconvoluted from H&E images numerically captures the architectural difference between glands and stroma. In addition, we postulated that joint histograms of local binary patterns and local variance (LBPxVAR) can be used as sensitive textural features to differentiate benign/normal tissue from cancer. Here we utilized a machine learning approach comprising of a support vector machine (SVM) followed by a random forest (RF) classifier to digitally stratify prostate tissue into ST, BN and PCa areas. Two pathologists manually annotated 210 images of low- and high-grade tumors from slides that were selected from 20 radical prostatectomies and digitized at high-resolution. The 210 images were split into the training (n=19) and test (n=191) sets. Local intensity histograms of H and E were used to train a SVM classifier to separate ST from epithelium (BN+PCa). The performance of SVM prediction was evaluated by measuring the accuracy of delineating epithelial areas. The Jaccard J=59.5 ± 14.6 and Rand Ri=62.0 ± 7.5 indices reported a significantly better prediction when compared to a reference method (Chen et al., Clinical Proteomics 2013, 10:18) based on the averaged values from the test set. To distinguish BN from PCa we trained a RF classifier with LBPxVAR and local intensity histograms and obtained separate performance values for BN and PCa: JBN=35.2 ± 24.9, OBN=49.6 ± 32, JPCa=49.5 ± 18.5, OPCa=72.7 ± 14.8 and Ri=60.6 ± 7.6 in the test set. Our pixel-based classification does not rely on the detection of lumens, which is prone to errors and has limitations in high-grade cancers and has the potential to aid in clinical studies in which the quantification of tumor content is necessary to prognosticate the course of the disease. The image data set with ground truth annotation is available for public use to stimulate further research in this area.

Keywords: Image analysis; Machine learning; Prostate cancer; Tissue classification; Tissue quantification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Epithelial Cells / pathology*
  • Humans
  • Image Enhancement / methods
  • Image Interpretation, Computer-Assisted / methods
  • Machine Learning
  • Male
  • Microscopy / methods*
  • Pattern Recognition, Automated / methods*
  • Prostatectomy / methods
  • Prostatic Neoplasms / pathology*
  • Prostatic Neoplasms / surgery*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Stromal Cells / pathology*
  • Treatment Outcome