Surrogate Biomarker Prediction from Whole-Slide Images for Evaluating Overall Survival in Lung Adenocarcinoma

Diagnostics (Basel). 2024 Feb 20;14(5):462. doi: 10.3390/diagnostics14050462.

Abstract

Background: Recent advances in computational pathology have shown potential in predicting biomarkers from haematoxylin and eosin (H&E) whole-slide images (WSI). However, predicting the outcome directly from WSIs remains a substantial challenge. In this study, we aimed to investigate how gene expression, predicted from WSIs, could be used to evaluate overall survival (OS) in patients with lung adenocarcinoma (LUAD).

Methods: Differentially expressed genes (DEGs) were identified from The Cancer Genome Atlas (TCGA)-LUAD cohort. Cox regression analysis was performed on DEGs to identify the gene prognostics of OS. Attention-based multiple instance learning (AMIL) models were trained to predict the expression of identified prognostic genes from WSIs using the TCGA-LUAD dataset. Models were externally validated in the Clinical Proteomic Tumour Analysis Consortium (CPTAC)-LUAD dataset. The prognostic value of predicted gene expression values was then compared to the true gene expression measurements.

Results: The expression of 239 prognostic genes could be predicted in TCGA-LUAD with cross-validated Pearson's R > 0.4. Predicted gene expression demonstrated prognostic performance, attaining a cross-validated concordance index of up to 0.615 in TCGA-LUAD through Cox regression. In total, 36 genes had predicted expression in the external validation cohort that was prognostic of OS.

Conclusions: Gene expression predicted from WSIs is an effective method of evaluating OS in patients with LUAD. These results may open up new avenues of cost- and time-efficient prognosis assessment in LUAD treatment.

Keywords: biomarkers; computational pathology; deep learning; lung adenocarcinoma; survival.