Estimating the frequency of events that cause multiple-nucleotide changes

Genetics. 2004 Aug;167(4):2027-43. doi: 10.1534/genetics.103.023226.

Abstract

Existing mathematical models of DNA sequence evolution assume that all substitutions derive from point mutations. There is, however, increasing evidence that larger-scale events, involving two or more consecutive sites, may also be important. We describe a model, denoted SDT, that allows for single-nucleotide, doublet, and triplet mutations. Applied to protein-coding DNA, the SDT model allows doublet and triplet mutations to overlap codon boundaries but still permits data to be analyzed using the simplifying assumption of independence of sites. We have implemented the SDT model for maximum-likelihood phylogenetic inference and have applied it to an alignment of mammalian globin sequences and to 258 other protein-coding sequence alignments from the Pandit database. We find the SDT model's inclusion of doublet and triplet mutations to be overwhelmingly successful in giving statistically significant improvements in fit of model to data, indicating that larger-scale mutation events do occur. Distributions of inferred parameter values over all alignments analyzed suggest that these events are far more prevalent than previously thought. Detailed consideration of our results and the absence of any known mechanism causing three adjacent nucleotides to be substituted simultaneously, however, leads us to suggest that the actual evolutionary events occurring may include still-larger-scale events, such as gene conversion, inversion, or recombination, or a series of rapid compensatory changes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • DNA / chemistry*
  • DNA / genetics*
  • Gene Conversion
  • Globins / chemistry
  • Globins / genetics
  • Mammals
  • Models, Genetic*
  • Models, Statistical
  • Mutation*
  • Polymorphism, Single Nucleotide*

Substances

  • Globins
  • DNA