Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology

David Tellez; Geert Litjens; Péter Bándi; Wouter Bulten; John-Melle Bokhorst; Francesco Ciompi; Jeroen van der Laak

doi:10.1016/j.media.2019.101544

Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology

Med Image Anal. 2019 Dec:58:101544. doi: 10.1016/j.media.2019.101544. Epub 2019 Aug 21.

Authors

David Tellez¹, Geert Litjens², Péter Bándi², Wouter Bulten², John-Melle Bokhorst², Francesco Ciompi², Jeroen van der Laak³

Affiliations

¹ Diagnostic Image Analysis Group and the Department of Pathology, Radboud University Medical Center, Nijmegen, the Netherlands. Electronic address: david.tellezmartin@radboudumc.nl.
² Diagnostic Image Analysis Group and the Department of Pathology, Radboud University Medical Center, Nijmegen, the Netherlands.
³ Diagnostic Image Analysis Group and the Department of Pathology, Radboud University Medical Center, Nijmegen, the Netherlands; Center for Medical Image Science and Visualization, Linköping University, Linköping, Sweden.

PMID: 31466046
DOI: 10.1016/j.media.2019.101544

Abstract

Stain variation is a phenomenon observed when distinct pathology laboratories stain tissue slides that exhibit similar but not identical color appearance. Due to this color shift between laboratories, convolutional neural networks (CNNs) trained with images from one lab often underperform on unseen images from the other lab. Several techniques have been proposed to reduce the generalization error, mainly grouped into two categories: stain color augmentation and stain color normalization. The former simulates a wide variety of realistic stain variations during training, producing stain-invariant CNNs. The latter aims to match training and test color distributions in order to reduce stain variation. For the first time, we compared some of these techniques and quantified their effect on CNN classification performance using a heterogeneous dataset of hematoxylin and eosin histopathology images from 4 organs and 9 pathology laboratories. Additionally, we propose a novel unsupervised method to perform stain color normalization using a neural network. Based on our experimental results, we provide practical guidelines on how to use stain color augmentation and stain color normalization in future computational pathology applications.

Keywords: Computational pathology; Convolutional neural network; Deep learning.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Color
Datasets as Topic
Eosine Yellowish-(YS)
Hematoxylin
Humans
Neural Networks, Computer*
Pathology, Clinical / standards*
Staining and Labeling*
Unsupervised Machine Learning

Substances

Eosine Yellowish-(YS)
Hematoxylin