Molecular Subtyping of Cancer Based on Robust Graph Neural Network and Multi-Omics Data Integration

Chaoyi Yin; Yangkun Cao; Peishuo Sun; Hengyuan Zhang; Zhi Li; Ying Xu; Huiyan Sun

doi:10.3389/fgene.2022.884028

Molecular Subtyping of Cancer Based on Robust Graph Neural Network and Multi-Omics Data Integration

Front Genet. 2022 May 13:13:884028. doi: 10.3389/fgene.2022.884028. eCollection 2022.

Authors

Chaoyi Yin¹, Yangkun Cao¹, Peishuo Sun¹, Hengyuan Zhang¹, Zhi Li², Ying Xu³, Huiyan Sun¹

Affiliations

¹ School of Artificial Intelligence, Jilin University, Changchun, China.
² Department of Medical Oncology, the First Hospital of China Medical University, Shenyang, China.
³ Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology and Institute of Bioinformatics, University of Georgia, Athens, GA, United States.

Abstract

Accurate molecular subtypes prediction of cancer patients is significant for personalized cancer diagnosis and treatments. Large amount of multi-omics data and the advancement of data-driven methods are expected to facilitate molecular subtyping of cancer. Most existing machine learning-based methods usually classify samples according to single omics data, fail to integrate multi-omics data to learn comprehensive representations of the samples, and ignore that information transfer and aggregation among samples can better represent them and ultimately help in classification. We propose a novel framework named multi-omics graph convolutional network (M-GCN) for molecular subtyping based on robust graph convolutional networks integrating multi-omics data. We first apply the Hilbert-Schmidt independence criterion least absolute shrinkage and selection operator (HSIC Lasso) to select the molecular subtype-related transcriptomic features and then construct a sample-sample similarity graph with low noise by using these features. Next, we take the selected gene expression, single nucleotide variants (SNV), and copy number variation (CNV) data as input and learn the multi-view representations of samples. On this basis, a robust variant of graph convolutional network (GCN) model is finally developed to obtain samples' new representations by aggregating their subgraphs. Experimental results of breast and stomach cancer demonstrate that the classification performance of M-GCN is superior to other existing methods. Moreover, the identified subtype-specific biomarkers are highly consistent with current clinical understanding and promising to assist accurate diagnosis and targeted drug development.

Keywords: feature selection; graph convolutional networks; molecular subtyping of cancer; multi-omics data; subtype-specific biomarkers.