SurvIAE: Survival prediction with Interpretable Autoencoders from Diffuse Large B-Cells Lymphoma gene expression data

Comput Methods Programs Biomed. 2024 Feb:244:107966. doi: 10.1016/j.cmpb.2023.107966. Epub 2023 Dec 2.

Abstract

Background: In Diffuse Large B-Cell Lymphoma (DLBCL), several methodologies are emerging to derive novel biomarkers to be incorporated in the risk assessment. We realized a pipeline that relies on autoencoders (AE) and Explainable Artificial Intelligence (XAI) to stratify prognosis and derive a gene-based signature.

Methods: AE was exploited to learn an unsupervised representation of the gene expression (GE) from three publicly available datasets, each with its own technology. Multi-layer perceptron (MLP) was used to classify prognosis from latent representation. GE data were preprocessed as normalized, scaled, and standardized. Four different AE architectures (Large, Medium, Small and Extra Small) were compared to find the most suitable for GE data. The joint AE-MLP classified patients on six different outcomes: overall survival at 12, 36, 60 months and progression-free survival (PFS) at 12, 36, 60 months. XAI techniques were used to derive a gene-based signature aimed at refining the Revised International Prognostic Index (R-IPI) risk, which was validated in a fourth independent publicly available dataset. We named our tool SurvIAE: Survival prediction with Interpretable AE.

Results: From the latent space of AEs, we observed that scaled and standardized data reduced the batch effect. SurvIAE models outperformed R-IPI with Matthews Correlation Coefficient up to 0.42 vs. 0.18 for the validation-set (PFS36) and to 0.30 vs. 0.19 for the test-set (PFS60). We selected the SurvIAE-Small-PFS36 as the best model and, from its gene signature, we stratified patients in three risk groups: R-IPI Poor patients with High levels of GAB1, R-IPI Poor patients with Low levels of GAB1 or R-IPI Good/Very Good patients with Low levels of GPR132, and R-IPI Good/Very Good patients with High levels of GPR132.

Conclusions: SurvIAE showed the potential to derive a gene signature with translational purpose in DLBCL. The pipeline was made publicly available and can be reused for other pathologies.

Keywords: Autoencoder; Explainable Artificial Intelligence; Gene expression data; Survival prediction.

MeSH terms

  • Antineoplastic Combined Chemotherapy Protocols
  • Artificial Intelligence*
  • Gene Expression
  • Humans
  • Lymphoma, Large B-Cell, Diffuse* / drug therapy
  • Lymphoma, Large B-Cell, Diffuse* / genetics
  • Prognosis
  • Retrospective Studies