A genome-wide cis-regulatory element discovery method based on promoter sequences and gene co-expression networks

BMC Genomics. 2013;14 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2164-14-S1-S4. Epub 2013 Jan 21.

Abstract

Background: Deciphering cis-regulatory networks has become an attractive yet challenging task. This paper presents a simple method for cis-regulatory network discovery which aims to avoid some of the common problems of previous approaches.

Results: Using promoter sequences and gene expression profiles as input, rather than clustering the genes by the expression data, our method utilizes co-expression neighborhood information for each individual gene, thereby overcoming the disadvantages of current clustering based models which may miss specific information for individual genes. In addition, rather than using a motif database as an input, it implements a simple motif count table for each enumerated k-mer for each gene promoter sequence. Thus, it can be used for species where previous knowledge of cis-regulatory motifs is unknown and has the potential to discover new transcription factor binding sites. Applications on Saccharomyces cerevisiae and Arabidopsis have shown that our method has a good prediction accuracy and outperforms a phylogenetic footprinting approach. Furthermore, the top ranked gene-motif regulatory clusters are evidently functionally co-regulated, and the regulatory relationships between the motifs and the enriched biological functions can often be confirmed by literature.

Conclusions: Since this method is simple and gene-specific, it can be readily utilized for insufficiently studied species or flexibly used as an additional step or data source for previous transcription regulatory networks discovery models.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Arabidopsis / genetics*
  • Cluster Analysis
  • Computational Biology
  • Gene Expression Profiling
  • Gene Regulatory Networks
  • Genome, Fungal*
  • Genome, Plant*
  • Multigene Family
  • Promoter Regions, Genetic
  • Saccharomyces cerevisiae / genetics*
  • Transcription Factors / genetics
  • Transcription Factors / metabolism

Substances

  • Transcription Factors