TargetOrtho: a phylogenetic footprinting tool to identify transcription factor targets

Genetics. 2014 May;197(1):61-76. doi: 10.1534/genetics.113.160721. Epub 2014 Feb 20.

Abstract

The identification of the regulatory targets of transcription factors is central to our understanding of how transcription factors fulfill their many key roles in development and homeostasis. DNA-binding sites have been uncovered for many transcription factors through a number of experimental approaches, but it has proven difficult to use this binding site information to reliably predict transcription factor target genes in genomic sequence space. Using the nematode Caenorhabditis elegans and other related nematode species as a starting point, we describe here a bioinformatic pipeline that identifies potential transcription factor target genes from genomic sequences. Among the key features of this pipeline is the use of sequence conservation of transcription-factor-binding sites in related species. Rather than using aligned genomic DNA sequences from the genomes of multiple species as a starting point, TargetOrtho scans related genome sequences independently for matches to user-provided transcription-factor-binding motifs, assigns motif matches to adjacent genes, and then determines whether orthologous genes in different species also contain motif matches. We validate TargetOrtho by identifying previously characterized targets of three different types of transcription factors in C. elegans, and we use TargetOrtho to identify novel target genes of the Collier/Olf/EBF transcription factor UNC-3 in C. elegans ventral nerve cord motor neurons. We have also implemented the use of TargetOrtho in Drosophila melanogaster using conservation among five species in the D. melanogaster species subgroup for target gene discovery.

Keywords: C. elegans; cis-regulatory element; transcription factor.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Binding Sites
  • Caenorhabditis elegans / genetics
  • Caenorhabditis elegans / metabolism
  • Caenorhabditis elegans Proteins / metabolism
  • Chromatin Immunoprecipitation
  • Computational Biology / methods*
  • Gene Expression Profiling
  • Gene Ontology
  • Introns / genetics
  • Phylogeny*
  • Transcription Factors / metabolism*
  • User-Computer Interface

Substances

  • Caenorhabditis elegans Proteins
  • Transcription Factors
  • unc-3 protein, C elegans