Effective and efficient active learning for deep learning-based tissue image analysis

Bioinformatics. 2023 Apr 3;39(4):btad138. doi: 10.1093/bioinformatics/btad138.

Abstract

Motivation: Deep learning attained excellent results in digital pathology recently. A challenge with its use is that high quality, representative training datasets are required to build robust models. Data annotation in the domain is labor intensive and demands substantial time commitment from expert pathologists. Active learning (AL) is a strategy to minimize annotation. The goal is to select samples from the pool of unlabeled data for annotation that improves model accuracy. However, AL is a very compute demanding approach. The benefits for model learning may vary according to the strategy used, and it may be hard for a domain specialist to fine tune the solution without an integrated interface.

Results: We developed a framework that includes a friendly user interface along with run-time optimizations to reduce annotation and execution time in AL in digital pathology. Our solution implements several AL strategies along with our diversity-aware data acquisition (DADA) acquisition function, which enforces data diversity to improve the prediction performance of a model. In this work, we employed a model simplification strategy [Network Auto-Reduction (NAR)] that significantly improves AL execution time when coupled with DADA. NAR produces less compute demanding models, which replace the target models during the AL process to reduce processing demands. An evaluation with a tumor-infiltrating lymphocytes classification application shows that: (i) DADA attains superior performance compared to state-of-the-art AL strategies for different convolutional neural networks (CNNs), (ii) NAR improves the AL execution time by up to 4.3×, and (iii) target models trained with patches/data selected by the NAR reduced versions achieve similar or superior classification quality to using target CNNs for data selection.

Availability and implementation: Source code: https://github.com/alsmeirelles/DADA.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Data Curation
  • Deep Learning*
  • Image Processing, Computer-Assisted
  • Neural Networks, Computer
  • Software