Network embedding framework for driver gene discovery by combining functional and structural information

BMC Genomics. 2023 Jul 29;24(1):426. doi: 10.1186/s12864-023-09515-x.

Abstract

Comprehensive analysis of multiple data sets can identify potential driver genes for various cancers. In recent years, driver gene discovery based on massive mutation data and gene interaction networks has attracted increasing attention, but there is still a need to explore combining functional and structural information of genes in protein interaction networks to identify driver genes. Therefore, we propose a network embedding framework combining functional and structural information to identify driver genes. Firstly, we combine the mutation data and gene interaction networks to construct mutation integration network using network propagation algorithm. Secondly, the struc2vec model is used for extracting gene features from the mutation integration network, which contains both gene's functional and structural information. Finally, machine learning algorithms are utilized to identify the driver genes. Compared with the previous four excellent methods, our method can find gene pairs that are distant from each other through structural similarities and has better performance in identifying driver genes for 12 cancers in the cancer genome atlas. At the same time, we also conduct a comparative analysis of three gene interaction networks, three gene standard sets, and five machine learning algorithms. Our framework provides a new perspective for feature selection to identify novel driver genes.

Keywords: Classification algorithm; Driver gene; Gene interaction network; Mutation data; Network embedding.

MeSH terms

  • Algorithms*
  • Gene Regulatory Networks
  • Genetic Association Studies
  • Machine Learning
  • Protein Interaction Mapping