Background and objective: Recognizing different tissue components is one of the most fundamental and essential works in digital pathology. Current methods are often based on convolutional neural networks (CNNs), which need numerous annotated samples for training. Creating large-scale histopathological datasets is labor-intensive, where interactive data annotation is a potential solution.
Methods: We propose DELR (Deep Embedding-based Logistic Regression) to enable rapid model training and inference for histopathological image analysis. DELR utilizes a pretrained CNN to encode images as compact embeddings with low computational cost. The embeddings are then used to train a Logistic Regression model efficiently. We implemented DELR in an active learning framework, and validated it on three histopathological problems (binary, 4-category, and 8-category classification challenge for lung, breast, and colorectal cancer, respectively). We also investigated the influence of active learning strategy and type of the encoder.
Results: On all the three datasets, DELR can achieve an area under curve (AUC) metric higher than 0.95 with only 100 image patches per class. Although its AUC is slightly lower than a fine-tuned CNN counterpart, DELR can be 536, 316, and 1481 times faster after pre-encoding. Moreover, DELR is proved to be compatible with a variety of active learning strategies and encoders.
Conclusions: DELR can achieve comparable accuracy to CNN with rapid running speed. These advantages make it a potential solution for real-time interactive data annotation.
Keywords: Active learning; Computer-aided diagnosis; Data annotation; Deep learning; Digital pathology; Tissue classification.
Copyright © 2021. Published by Elsevier B.V.