Vestige: maximum likelihood phylogenetic footprinting

BMC Bioinformatics. 2005 May 29:6:130. doi: 10.1186/1471-2105-6-130.

Abstract

Background: Phylogenetic footprinting is the identification of functional regions of DNA by their evolutionary conservation. This is achieved by comparing orthologous regions from multiple species and identifying the DNA regions that have diverged less than neutral DNA. Vestige is a phylogenetic footprinting package built on the PyEvolve toolkit that uses probabilistic molecular evolutionary modelling to represent aspects of sequence evolution, including the conventional divergence measure employed by other footprinting approaches. In addition to measuring the divergence, Vestige allows the expansion of the definition of a phylogenetic footprint to include variation in the distribution of any molecular evolutionary processes. This is achieved by displaying the distribution of model parameters that represent partitions of molecular evolutionary substitutions. Examination of the spatial incidence of these effects across regions of the genome can identify DNA segments that differ in the nature of the evolutionary process.

Results: Vestige was applied to a reference dataset of the SCL locus from four species and provided clear identification of the known conserved regions in this dataset. To demonstrate the flexibility to use diverse models of molecular evolution and dissect the nature of the evolutionary process Vestige was used to footprint the Ka/Ks ratio in primate BRCA1 with a codon model of evolution. Two regions of putative adaptive evolution were identified illustrating the ability of Vestige to represent the spatial distribution of distinct molecular evolutionary processes.

Conclusion: Vestige provides a flexible, open platform for phylogenetic footprinting. Underpinned by the PyEvolve toolkit, Vestige provides a framework for visualising the signatures of evolutionary processes across the genome of numerous organisms simultaneously. By exploiting the maximum-likelihood statistical framework, the complex interplay between mutational processes, DNA repair and selection can be evaluated both spatially (along a sequence alignment) and temporally (for each branch of the tree) providing visual indicators to the attributes and functions of DNA sequences.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • BRCA1 Protein / genetics
  • Base Sequence
  • Codon
  • Computational Biology / methods*
  • Computer Simulation
  • DNA / chemistry
  • DNA Repair
  • Data Interpretation, Statistical*
  • Evolution, Molecular
  • Genome
  • Humans
  • Likelihood Functions
  • Models, Biological
  • Models, Statistical
  • Phylogeny
  • Programming Languages
  • Regulatory Sequences, Nucleic Acid
  • Sequence Alignment
  • Sequence Analysis, DNA
  • Sequence Analysis, Protein
  • Software
  • Species Specificity
  • Time Factors

Substances

  • BRCA1 Protein
  • Codon
  • DNA