Deep embeddings and logistic regression for rapid active learning in histopathological images

Comput Methods Programs Biomed. 2021 Nov:212:106464. doi: 10.1016/j.cmpb.2021.106464. Epub 2021 Oct 13.

Abstract

Background and objective: Recognizing different tissue components is one of the most fundamental and essential works in digital pathology. Current methods are often based on convolutional neural networks (CNNs), which need numerous annotated samples for training. Creating large-scale histopathological datasets is labor-intensive, where interactive data annotation is a potential solution.

Methods: We propose DELR (Deep Embedding-based Logistic Regression) to enable rapid model training and inference for histopathological image analysis. DELR utilizes a pretrained CNN to encode images as compact embeddings with low computational cost. The embeddings are then used to train a Logistic Regression model efficiently. We implemented DELR in an active learning framework, and validated it on three histopathological problems (binary, 4-category, and 8-category classification challenge for lung, breast, and colorectal cancer, respectively). We also investigated the influence of active learning strategy and type of the encoder.

Results: On all the three datasets, DELR can achieve an area under curve (AUC) metric higher than 0.95 with only 100 image patches per class. Although its AUC is slightly lower than a fine-tuned CNN counterpart, DELR can be 536, 316, and 1481 times faster after pre-encoding. Moreover, DELR is proved to be compatible with a variety of active learning strategies and encoders.

Conclusions: DELR can achieve comparable accuracy to CNN with rapid running speed. These advantages make it a potential solution for real-time interactive data annotation.

Keywords: Active learning; Computer-aided diagnosis; Data annotation; Deep learning; Digital pathology; Tissue classification.

MeSH terms

  • Area Under Curve
  • Image Processing, Computer-Assisted
  • Logistic Models
  • Neural Networks, Computer*