Identification of the functional alteration signatures across different cancer types with support vector machine and feature analysis

Biochim Biophys Acta Mol Basis Dis. 2018 Jun;1864(6 Pt B):2218-2227. doi: 10.1016/j.bbadis.2017.12.026. Epub 2017 Dec 19.

Abstract

Cancers are regarded as malignant proliferations of tumor cells present in many tissues and organs, which can severely curtail the quality of human life. The potential of using plasma DNA for cancer detection has been widely recognized, leading to the need of mapping the tissue-of-origin through the identification of somatic mutations. With cutting-edge technologies, such as next-generation sequencing, numerous somatic mutations have been identified, and the mutation signatures have been uncovered across different cancer types. However, somatic mutations are not independent events in carcinogenesis but exert functional effects. In this study, we applied a pan-cancer analysis to five types of cancers: (I) breast cancer (BRCA), (II) colorectal adenocarcinoma (COADREAD), (III) head and neck squamous cell carcinoma (HNSC), (IV) kidney renal clear cell carcinoma (KIRC), and (V) ovarian cancer (OV). Based on the mutated genes of patients suffering from one of the aforementioned cancer types, patients they were encoded into a large number of numerical values based upon the enrichment theory of gene ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. We analyzed these features with the Monte-Carlo Feature Selection (MCFS) method, followed by the incremental feature selection (IFS) method to identify functional alteration features that could be used to build the support vector machine (SVM)-based classifier for distinguishing the five types of cancers. Our results showed that the optimal classifier with the selected 344 features had the highest Matthews correlation coefficient value of 0.523. Sixteen decision rules produced by the MCFS method can yield an overall accuracy of 0.498 for the classification of the five cancer types. Further analysis indicated that some of these features and rules were supported by previous experiments. This study not only presents a new approach to mapping the tissue-of-origin for cancer detection but also unveils the specific functional alterations of each cancer type, providing insight into cancer-specific functional aberrations as potential therapeutic targets. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang.

Keywords: Cancer prediction; Gene ontology; KEGG; Monte-Carlo feature selection; Support vector machine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA, Neoplasm* / genetics
  • DNA, Neoplasm* / metabolism
  • Gene Expression Regulation, Neoplastic*
  • Gene Ontology*
  • Genes, Neoplasm*
  • Humans
  • Neoplasms* / genetics
  • Neoplasms* / metabolism
  • Support Vector Machine*

Substances

  • DNA, Neoplasm