Deep learning-based subtyping of gastric cancer histology predicts clinical outcome: a multi-institutional retrospective study

Gregory Patrick Veldhuizen; Christoph Röcken; Hans-Michael Behrens; Didem Cifci; Hannah Sophie Muti; Takaki Yoshikawa; Tomio Arai; Takashi Oshima; Patrick Tan; Matthias P Ebert; Alexander T Pearson; Julien Calderaro; Heike I Grabsch; Jakob Nikolas Kather

doi:10.1007/s10120-023-01398-x

Deep learning-based subtyping of gastric cancer histology predicts clinical outcome: a multi-institutional retrospective study

Gastric Cancer. 2023 Sep;26(5):708-720. doi: 10.1007/s10120-023-01398-x. Epub 2023 Jun 3.

Authors

Gregory Patrick Veldhuizen¹, Christoph Röcken², Hans-Michael Behrens², Didem Cifci^{1

3}, Hannah Sophie Muti^{1

4}, Takaki Yoshikawa⁵, Tomio Arai⁶, Takashi Oshima⁷, Patrick Tan⁸, Matthias P Ebert^{9

10

11

12}, Alexander T Pearson¹³, Julien Calderaro^{14

15}, Heike I Grabsch^{16

17}, Jakob Nikolas Kather^{18

19

20

21}

Affiliations

¹ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany.
² Department of Pathology, Christian-Albrechts University, Kiel, Germany.
³ Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
⁴ Department of Visceral, Thoracic and Vascular Surgery, Technical University Dresden, University Hospital Carl Gustav Carus, Dresden, Germany.
⁵ Department of Gastric Surgery, National Cancer Center Hospital, Tokyo, Japan.
⁶ Department of Pathology, Tokyo Metropolitan Geriatric Hospital and Institute of Gerontology, Tokyo, Japan.
⁷ Department of Gastrointestinal Surgery, Kanagawa Cancer Center, Yokohama, Japan.
⁸ Duke-NUS Medical School, Singapore, Singapore.
⁹ Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
¹⁰ DKFZ-Hector Cancer Institute at the University Medical Center, Mannheim, Germany.
¹¹ Clinical Cooperation Unit Healthy Metabolism, Center for Preventive Medicine and Digital Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
¹² Mannheim Institute for Innate Immunoscience (MI3), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
¹³ Department of Medicine, Section of Hematology/Oncology, The University of Chicago, Chicago, IL, USA.
¹⁴ Université Paris Est Créteil, INSERM, IMRB, Créteil, France.
¹⁵ Department of Pathology, Assistance Publique-Hôpitaux de Paris, Henri Mondor-Albert Chenevier University Hospital, Créteil, France.
¹⁶ Pathology & Data Analytics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK.
¹⁷ Department of Pathology, GROW School for Oncology and Reproduction, Maastricht University Medical Center+, Maastricht, The Netherlands.
¹⁸ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany. jakob-nikolas.kather@alumni.dkfz.de.
¹⁹ Pathology & Data Analytics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK. jakob-nikolas.kather@alumni.dkfz.de.
²⁰ Department of Medicine I, University Hospital Dresden, Dresden, Germany. jakob-nikolas.kather@alumni.dkfz.de.
²¹ Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany. jakob-nikolas.kather@alumni.dkfz.de.

Abstract

Introduction: The Laurén classification is widely used for Gastric Cancer (GC) histology subtyping. However, this classification is prone to interobserver variability and its prognostic value remains controversial. Deep Learning (DL)-based assessment of hematoxylin and eosin (H&E) stained slides is a potentially useful tool to provide an additional layer of clinically relevant information, but has not been systematically assessed in GC.

Objective: We aimed to train, test and externally validate a deep learning-based classifier for GC histology subtyping using routine H&E stained tissue sections from gastric adenocarcinomas and to assess its potential prognostic utility.

Methods: We trained a binary classifier on intestinal and diffuse type GC whole slide images for a subset of the TCGA cohort (N = 166) using attention-based multiple instance learning. The ground truth of 166 GC was obtained by two expert pathologists. We deployed the model on two external GC patient cohorts, one from Europe (N = 322) and one from Japan (N = 243). We assessed classification performance using the Area Under the Receiver Operating Characteristic Curve (AUROC) and prognostic value (overall, cancer specific and disease free survival) of the DL-based classifier with uni- and multivariate Cox proportional hazard models and Kaplan-Meier curves with log-rank test statistics.

Results: Internal validation using the TCGA GC cohort using five-fold cross-validation achieved a mean AUROC of 0.93 ± 0.07. External validation showed that the DL-based classifier can better stratify GC patients' 5-year survival compared to pathologist-based Laurén classification for all survival endpoints, despite frequently divergent model-pathologist classifications. Univariate overall survival Hazard Ratios (HRs) of pathologist-based Laurén classification (diffuse type versus intestinal type) were 1.14 (95% Confidence Interval (CI) 0.66-1.44, p-value = 0.51) and 1.23 (95% CI 0.96-1.43, p-value = 0.09) in the Japanese and European cohorts, respectively. DL-based histology classification resulted in HR of 1.46 (95% CI 1.18-1.65, p-value < 0.005) and 1.41 (95% CI 1.20-1.57, p-value < 0.005), in the Japanese and European cohorts, respectively. In diffuse type GC (as defined by the pathologist), classifying patients using the DL diffuse and intestinal classifications provided a superior survival stratification, and demonstrated statistically significant survival stratification when combined with pathologist classification for both the Asian (overall survival log-rank test p-value < 0.005, HR 1.43 (95% CI 1.05-1.66, p-value = 0.03) and European cohorts (overall survival log-rank test p-value < 0.005, HR 1.56 (95% CI 1.16-1.76, p-value < 0.005)).

Conclusion: Our study shows that gastric adenocarcinoma subtyping using pathologist's Laurén classification as ground truth can be performed using current state of the art DL techniques. Patient survival stratification seems to be better by DL-based histology typing compared with expert pathologist histology typing. DL-based GC histology typing has potential as an aid in subtyping. Further investigations are warranted to fully understand the underlying biological mechanisms for the improved survival stratification despite apparent imperfect classification by the DL algorithm.

Keywords: Deep learning classifier; Eosin staining; Gastric cancer histology; Hematoxylin; Laurén classification; Prognostic utility; Survival stratification.

Publication types

Multicenter Study

MeSH terms

Adenocarcinoma* / pathology
Deep Learning*
Humans
Prognosis
Proportional Hazards Models
Retrospective Studies
Stomach Neoplasms* / pathology

Abstract

Publication types

MeSH terms

Grants and funding