Multi-Omic Data Improve Prediction of Personalized Tumor Suppressors and Oncogenes

Malvika Sudhakar; Raghunathan Rengaswamy; Karthik Raman

doi:10.3389/fgene.2022.854190

Multi-Omic Data Improve Prediction of Personalized Tumor Suppressors and Oncogenes

Front Genet. 2022 May 10:13:854190. doi: 10.3389/fgene.2022.854190. eCollection 2022.

Authors

Malvika Sudhakar^{1

2

3}, Raghunathan Rengaswamy^{1

2

4}, Karthik Raman^{1

2

3}

Affiliations

¹ Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology (IIT) Madras, Chennai, India.
² Robert Bosch Center for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India.
³ Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, IIT Madras, Chennai, India.
⁴ Department of Chemical Engineering, IIT Madras, Chennai, India.

Abstract

The progression of tumorigenesis starts with a few mutational and structural driver events in the cell. Various cohort-based computational tools exist to identify driver genes but require multiple samples to identify less frequently mutated driver genes. Many studies use different methods to identify driver mutations/genes from mutations that have no impact on tumor progression; however, a small fraction of patients show no mutational events in any known driver genes. Current unsupervised methods map somatic and expression data onto a network to identify personalized driver genes based on changes in expression. Our method is the first machine learning model to classify genes as tumor suppressor gene (TSG), oncogene (OG), or neutral, thus assigning the functional impact of the gene in the patient. In this study, we develop a multi-omic approach, PIVOT (Personalized Identification of driVer OGs and TSGs), to train on experimentally or computationally validated mutational and structural driver events. Given the lack of any gold standards for the identification of personalized driver genes, we label the data using four strategies and, based on classification metrics, show gene-based labeling strategies perform best. We build different models using SNV, RNA, and multi-omic features to be used based on the data available. Our models trained on multi-omic data improved predictions compared with mutation and expression data, achieving an accuracy $\geq 0.99$ for BRCA, LUAD, and COAD datasets. We show network and expression-based features contribute the most to PIVOT. Our predictions on BRCA, COAD, and LUAD cancer types reveal commonly altered genes such as TP53 and PIK3CA, which are predicted drivers for multiple cancer types. Along with known driver genes, our models also identify new driver genes such as PRKCA, SOX9, and PSMD4. Our multi-omic model labels both CNV and mutations with a more considerable contribution by CNV alterations. While predicting labels for genes mutated in multiple samples, we also label rare driver events occurring in as few as one sample. We also identify genes with dual roles within the same cancer type. Overall, PIVOT labels personalized driver genes as TSGs and OGs and also identified rare driver genes.

Keywords: PIVOT; cancer genomics; driver genes; machine learning; personalized driver genes.