Content-based image retrieval of digitized histopathology in boosted spectrally embedded spaces

J Pathol Inform. 2015 Jun 29:6:41. doi: 10.4103/2153-3539.159441. eCollection 2015.

Abstract

Context: Content-based image retrieval (CBIR) systems allow for retrieval of images from within a database that are similar in visual content to a query image. This is useful for digital pathology, where text-based descriptors alone might be inadequate to accurately describe image content. By representing images via a set of quantitative image descriptors, the similarity between a query image with respect to archived, annotated images in a database can be computed and the most similar images retrieved. Recently, non-linear dimensionality reduction methods have become popular for embedding high-dimensional data into a reduced-dimensional space while preserving local object adjacencies, thereby allowing for object similarity to be determined more accurately in the reduced-dimensional space. However, most dimensionality reduction methods implicitly assume, in computing the reduced-dimensional representation, that all features are equally important.

Aims: In this paper we present boosted spectral embedding(BoSE), which utilizes a boosted distance metric to selectively weight individual features (based on training data) to subsequently map the data into a reduced-dimensional space.

Settings and design: BoSE is evaluated against spectral embedding (SE) (which employs equal feature weighting) in the context of CBIR of digitized prostate and breast cancer histopathology images.

Materials and methods: The following datasets, which were comprised of a total of 154 hematoxylin and eosin stained histopathology images, were used: (1) Prostate cancer histopathology (benign vs. malignant), (2) estrogen receptor (ER) + breast cancer histopathology (low vs. high grade), and (3) HER2+ breast cancer histopathology (low vs. high levels of lymphocytic infiltration).

Statistical analysis used: We plotted and calculated the area under precision-recall curves (AUPRC) and calculated classification accuracy using the Random Forest classifier.

Results: BoSE outperformed SE both in terms of CBIR-based (area under the precision-recall curve) and classifier-based (classification accuracy) on average across all of the dimensions tested for all three datasets: (1) Prostate cancer histopathology (AUPRC: BoSE = 0.79, SE = 0.63; Accuracy: BoSE = 0.93, SE = 0.80), (2) ER + breast cancer histopathology (AUPRC: BoSE = 0.79, SE = 0.68; Accuracy: BoSE = 0.96, SE = 0.96), and (3) HER2+ breast cancer histopathology (AUPRC: BoSE = 0.54, SE = 0.44; Accuracy: BoSE = 0.93, SE = 0.91).

Conclusion: Our results suggest that BoSE could serve as an important tool for CBIR and classification of high-dimensional biomedical data.

Keywords: Boosted; breast cancer; content-based image retrieval; histopathology; non-linear dimensionality reduction; prostate cancer; spectral embedding.