LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights

Sci Rep. 2016 Jan 11:6:18871. doi: 10.1038/srep18871.

Abstract

Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Autistic Disorder / genetics*
  • Autistic Disorder / physiopathology
  • Benchmarking
  • Breast Neoplasms / genetics*
  • Breast Neoplasms / pathology
  • Databases, Genetic
  • Datasets as Topic
  • Epistasis, Genetic*
  • Female
  • Gene Expression Profiling
  • Gene Ontology
  • Gene Regulatory Networks*
  • Humans
  • Saccharomyces cerevisiae / genetics
  • Saccharomyces cerevisiae / metabolism