A new correlation clustering method for cancer mutation analysis

Bioinformatics. 2016 Dec 15;32(24):3717-3728. doi: 10.1093/bioinformatics/btw546. Epub 2016 Aug 18.

Abstract

Motivation: Cancer genomes exhibit a large number of different alterations that affect many genes in a diverse manner. An improved understanding of the generative mechanisms behind the mutation rules and their influence on gene community behavior is of great importance for the study of cancer.

Results: To expand our capability to analyze combinatorial patterns of cancer alterations, we developed a rigorous methodology for cancer mutation pattern discovery based on a new, constrained form of correlation clustering. Our new algorithm, named C3 (Cancer Correlation Clustering), leverages mutual exclusivity of mutations, patient coverage and driver network concentration principles. To test C3, we performed a detailed analysis on TCGA breast cancer and glioblastoma data and showed that our algorithm outperforms the state-of-the-art CoMEt method in terms of discovering mutually exclusive gene modules and identifying biologically relevant driver genes. The proposed agnostic clustering method represents a unique tool for efficient and reliable identification of mutation patterns and driver pathways in large-scale cancer genomics studies, and it may also be used for other clustering problems on biological graphs.

Availability and implementation: The source code for the C3 method can be found at https://github.com/jackhou2/C3 CONTACTS: jianma@cs.cmu.edu or milenkov@illinois.eduSupplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms*
  • Breast Neoplasms / genetics*
  • Cluster Analysis*
  • Computational Biology / methods*
  • DNA Mutational Analysis / methods*
  • Female
  • Gene Regulatory Networks
  • Glioblastoma / genetics*
  • Humans
  • Mutation