DriverRWH: discovering cancer driver genes by random walk on a gene mutation hypergraph

BMC Bioinformatics. 2022 Jul 13;23(1):277. doi: 10.1186/s12859-022-04788-7.

Abstract

Background: Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data. A critical challenge in cancer genomics is identification of a few cancer driver genes whose mutations cause tumor growth. However, the majority of existing computational approaches underuse the co-occurrence mutation information of the individuals, which are deemed to be important in tumorigenesis and tumor progression, resulting in high rate of false positive.

Results: To make full use of co-mutation information, we present a random walk algorithm referred to as DriverRWH on a weighted gene mutation hypergraph model, using somatic mutation data and molecular interaction network data to prioritize candidate driver genes. Applied to tumor samples of different cancer types from The Cancer Genome Atlas, DriverRWH shows significantly better performance than state-of-art prioritization methods in terms of the area under the curve scores and the cumulative number of known driver genes recovered in top-ranked candidate genes. Besides, DriverRWH discovers several potential drivers, which are enriched in cancer-related pathways. DriverRWH recovers approximately 50% known driver genes in the top 30 ranked candidate genes for more than half of the cancer types. In addition, DriverRWH is also highly robust to perturbations in the mutation data and gene functional network data.

Conclusion: DriverRWH is effective among various cancer types in prioritizes cancer driver genes and provides considerable improvement over other tools with a better balance of precision and sensitivity. It can be a useful tool for detecting potential driver genes and facilitate targeted cancer therapies.

Keywords: Cancer driver genes; Candidate gene prioritization; Gene network; Hypergraph model; Random walk; Somatic mutation.

MeSH terms

  • Genomics / methods
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Mutation
  • Neoplasms* / genetics
  • Oncogenes*