Surveying phylogenetic footprints in large gene clusters: applications to Hox cluster duplications

Mol Phylogenet Evol. 2004 May;31(2):581-604. doi: 10.1016/j.ympev.2003.08.009.

Abstract

Evolutionarily conserved non-coding genomic sequences represent a potentially rich source for the discovery of gene regulatory regions. Since these elements are subject to stabilizing selection they evolve much more slowly than adjacent non-functional DNA. These so-called phylogenetic footprints can be detected by comparison of the sequences surrounding orthologous genes in different species. Therefore the loss of phylogenetic footprints as well as the acquisition of conserved non-coding sequences in some lineages, but not in others, can provide evidence for the evolutionary modification of cis-regulatory elements. We introduce here a statistical model of footprint evolution that allows us to estimate the loss of sequence conservation that can be attributed to gene loss and other structural reasons. This approach to studying the pattern of cis-regulatory element evolution, however, requires the comparison of relatively long sequences from many species. We have therefore developed an efficient software tool for the identification of corresponding footprints in long sequences from multiple species. We apply this novel method to the published sequences of HoxA clusters of shark, human, and the duplicated zebrafish and Takifugu clusters as well as the published HoxB cluster sequences. We find that there is a massive loss of sequence conservation in the intergenic region of the HoxA clusters, consistent with the finding in [Chiu et al., PNAS 99 (2002) 5492]. The loss of conservation after cluster duplication is more extensive than expected from structural reasons. This suggests that binding site turnover and/or adaptive modification may also contribute to the loss of sequence conservation.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Base Sequence
  • Conserved Sequence / genetics
  • Evolution, Molecular
  • Gene Duplication
  • Homeodomain Proteins / classification*
  • Homeodomain Proteins / genetics*
  • Humans
  • Models, Statistical
  • Molecular Sequence Data
  • Multigene Family*
  • Phylogeny*
  • Protein Footprinting
  • Regulatory Sequences, Nucleic Acid*
  • Sequence Analysis, DNA
  • Software*

Substances

  • Homeodomain Proteins