Generating region proposals for histopathological whole slide image retrieval

Comput Methods Programs Biomed. 2018 Jun:159:1-10. doi: 10.1016/j.cmpb.2018.02.020. Epub 2018 Feb 23.

Abstract

Background and objective: Content-based image retrieval is an effective method for histopathological image analysis. However, given a database of huge whole slide images (WSIs), acquiring appropriate region-of-interests (ROIs) for training is significant and difficult. Moreover, histopathological images can only be annotated by pathologists, resulting in the lack of labeling information. Therefore, it is an important and challenging task to generate ROIs from WSI and retrieve image with few labels.

Methods: This paper presents a novel unsupervised region proposing method for histopathological WSI based on Selective Search. Specifically, the WSI is over-segmented into regions which are hierarchically merged until the WSI becomes a single region. Nucleus-oriented similarity measures for region mergence and Nucleus-Cytoplasm color space for histopathological image are specially defined to generate accurate region proposals. Additionally, we propose a new semi-supervised hashing method for image retrieval. The semantic features of images are extracted with Latent Dirichlet Allocation and transformed into binary hashing codes with Supervised Hashing.

Results: The methods are tested on a large-scale multi-class database of breast histopathological WSIs. The results demonstrate that for one WSI, our region proposing method can generate 7.3 thousand contoured regions which fit well with 95.8% of the ROIs annotated by pathologists. The proposed hashing method can retrieve a query image among 136 thousand images in 0.29 s and reach precision of 91% with only 10% of images labeled.

Conclusions: The unsupervised region proposing method can generate regions as predictions of lesions in histopathological WSI. The region proposals can also serve as the training samples to train machine-learning models for image retrieval. The proposed hashing method can achieve fast and precise image retrieval with small amount of labels. Furthermore, the proposed methods can be potentially applied in online computer-aided-diagnosis systems.

Keywords: Content-based image retrieval; Latent Dirichlet allocation; Region proposal; Selective Search; Supervised hashing; Whole slide image.

MeSH terms

  • Algorithms
  • Breast / pathology*
  • Breast Neoplasms / pathology*
  • Databases, Factual
  • Diagnosis, Computer-Assisted / methods*
  • Female
  • Histological Techniques
  • Humans
  • Image Interpretation, Computer-Assisted / methods*
  • Image Processing, Computer-Assisted / methods*
  • Machine Learning
  • Models, Statistical
  • Pattern Recognition, Automated / methods
  • Reproducibility of Results
  • Sensitivity and Specificity