Performance Comparison of Deep Learning Autoencoders for Cancer Subtype Detection Using Multi-Omics Data

Edian F Franco; Pratip Rana; Aline Cruz; Víctor V Calderón; Vasco Azevedo; Rommel T J Ramos; Preetam Ghosh

doi:10.3390/cancers13092013

Performance Comparison of Deep Learning Autoencoders for Cancer Subtype Detection Using Multi-Omics Data

Cancers (Basel). 2021 Apr 22;13(9):2013. doi: 10.3390/cancers13092013.

Authors

Edian F Franco^{1

2

3}, Pratip Rana⁴, Aline Cruz⁵, Víctor V Calderón³, Vasco Azevedo⁶, Rommel T J Ramos³, Preetam Ghosh⁴

Affiliations

¹ Institute of Biological Sciences, Federal University of Para, Belem, PA 66075-110, Brazil.
² Laboratory of Virology and Environmental Genomics, Instituto de Innovacion en Biotecnologia e Industria (IIBI), Santo Domingo 10104, Dominican Republic.
³ Instituto Tecnológico de Santo Domingo (INTEC), Santo Domingo 10602, Dominican Republic.
⁴ Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA.
⁵ Programa de Pós-Graduação em Enfermagem, Federal University of Para, Belem, PA 66075-110, Brazil.
⁶ Institute of Biological Science, Federal University of Minas Gerais, Belo Horizonte, MG 31270-901, Brazil.

Abstract

A heterogeneous disease such as cancer is activated through multiple pathways and different perturbations. Depending upon the activated pathway(s), the survival of the patients varies significantly and shows different efficacy to various drugs. Therefore, cancer subtype detection using genomics level data is a significant research problem. Subtype detection is often a complex problem, and in most cases, needs multi-omics data fusion to achieve accurate subtyping. Different data fusion and subtyping approaches have been proposed over the years, such as kernel-based fusion, matrix factorization, and deep learning autoencoders. In this paper, we compared the performance of different deep learning autoencoders for cancer subtype detection. We performed cancer subtype detection on four different cancer types from The Cancer Genome Atlas (TCGA) datasets using four autoencoder implementations. We also predicted the optimal number of subtypes in a cancer type using the silhouette score and found that the detected subtypes exhibit significant differences in survival profiles. Furthermore, we compared the effect of feature selection and similarity measures for subtype detection. For further evaluation, we used the Glioblastoma multiforme (GBM) dataset and identified the differentially expressed genes in each of the subtypes. The results obtained are consistent with other genomic studies and can be corroborated with the involved pathways and biological functions. Thus, it shows that the results from the autoencoders, obtained through the interaction of different datatypes of cancer, can be used for the prediction and characterization of patient subgroups and survival profiles.

Keywords: autoencoder; cancer subtype detection; data integration; multi-omics data; survival analysis.

Abstract

Grants and funding