Prediction of Breast Cancer Recurrence Using a Deep Convolutional Neural Network Without Region-of-Interest Labeling

Nam Nhut Phan; Chih-Yi Hsu; Chi-Cheng Huang; Ling-Ming Tseng; Eric Y Chuang

doi:10.3389/fonc.2021.734015

Prediction of Breast Cancer Recurrence Using a Deep Convolutional Neural Network Without Region-of-Interest Labeling

Front Oncol. 2021 Oct 21:11:734015. doi: 10.3389/fonc.2021.734015. eCollection 2021.

Authors

Nam Nhut Phan^{1

2

3}, Chih-Yi Hsu^{4

5

6}, Chi-Cheng Huang^{7

8}, Ling-Ming Tseng^{5

7}, Eric Y Chuang^{2

3

9}

Affiliations

¹ Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan.
² Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.
³ Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei, Taiwan.
⁴ Department of Pathology and Laboratory Medicine, Taipei Veterans General Hospital, Taipei, Taiwan.
⁵ School of Medicine, National Yang-Ming University, Taipei, Taiwan.
⁶ College of Nursing, National Taipei University of Nursing and Health Sciences, Taipei, Taiwan.
⁷ Comprehensive Breast Health Center, Taipei Veterans General Hospital, Taipei, Taiwan.
⁸ Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan.
⁹ Master Program for Biomedical Engineering, China Medical University, Taichung, Taiwan.

Abstract

Purpose: The present study aimed to assign a risk score for breast cancer recurrence based on pathological whole slide images (WSIs) using a deep learning model.

Methods: A total of 233 WSIs from 138 breast cancer patients were assigned either a low-risk or a high-risk score based on a 70-gene signature. These images were processed into patches of 512x512 pixels by the PyHIST tool and underwent color normalization using the Macenko method. Afterward, out of focus and pixelated patches were removed using the Laplacian algorithm. Finally, the remaining patches (n=294,562) were split into 3 parts for model training (50%), validation (7%) and testing (43%). We used 6 pretrained models for transfer learning and evaluated their performance using accuracy, precision, recall, F1 score, confusion matrix, and AUC. Additionally, to demonstrate the robustness of the final model and its generalization capacity, the testing set was used for model evaluation. Finally, the GRAD-CAM algorithm was used for model visualization.

Results: Six models, namely VGG16, ResNet50, ResNet101, Inception_ResNet, EfficientB5, and Xception, achieved high performance in the validation set with an overall accuracy of 0.84, 0.85, 0.83, 0.84, 0.87, and 0.91, respectively. We selected Xception for assessment of the testing set, and this model achieved an overall accuracy of 0.87 with a patch-wise approach and 0.90 and 1.00 with a patient-wise approach for high-risk and low-risk groups, respectively.

Conclusions: Our study demonstrated the feasibility and high performance of artificial intelligence models trained without region-of-interest labeling for predicting cancer recurrence based on a 70-gene signature risk score.

Keywords: 70-gene signature; deep learning; label-free; pathology; transfer learning; whole slide image.