Combining phylogenetic footprinting with motif models incorporating intra-motif dependencies

BMC Bioinformatics. 2017 Mar 1;18(1):141. doi: 10.1186/s12859-017-1495-1.

Abstract

Background: Transcriptional gene regulation is a fundamental process in nature, and the experimental and computational investigation of DNA binding motifs and their binding sites is a prerequisite for elucidating this process. Approaches for de-novo motif discovery can be subdivided in phylogenetic footprinting that takes into account phylogenetic dependencies in aligned sequences of more than one species and non-phylogenetic approaches based on sequences from only one species that typically take into account intra-motif dependencies. It has been shown that modeling (i) phylogenetic dependencies as well as (ii) intra-motif dependencies separately improves de-novo motif discovery, but there is no approach capable of modeling both (i) and (ii) simultaneously.

Results: Here, we present an approach for de-novo motif discovery that combines phylogenetic footprinting with motif models capable of taking into account intra-motif dependencies. We study the degree of intra-motif dependencies inferred by this approach from ChIP-seq data of 35 transcription factors. We find that significant intra-motif dependencies of orders 1 and 2 are present in all 35 datasets and that intra-motif dependencies of order 2 are typically stronger than those of order 1. We also find that the presented approach improves the classification performance of phylogenetic footprinting in all 35 datasets and that incorporating intra-motif dependencies of order 2 yields a higher classification performance than incorporating such dependencies of only order 1.

Conclusion: Combining phylogenetic footprinting with motif models incorporating intra-motif dependencies leads to an improved performance in the classification of transcription factor binding sites. This may advance our understanding of transcriptional gene regulation and its evolution.

Keywords: ChIP-Seq; Evolution; Gene regulation; Phylogenetic footprinting; Transcription factor binding sites.

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Binding Sites / genetics
  • Chromatin / metabolism
  • DNA / chemistry
  • DNA / metabolism
  • Humans
  • Models, Molecular*
  • Phylogeny
  • Protein Binding
  • Protein Domains
  • Sequence Analysis, DNA
  • Transcription Factors / classification*
  • Transcription Factors / genetics
  • Transcription Factors / metabolism

Substances

  • Chromatin
  • Transcription Factors
  • DNA