Estimation of phylogeny and invariant sites under the general Markov model of nucleotide sequence evolution

Syst Biol. 2007 Apr;56(2):155-62. doi: 10.1080/10635150701247921.

Abstract

The models of nucleotide substitution used by most maximum likelihood-based methods assume that the evolutionary process is stationary, reversible, and homogeneous. We present an extension of the Barry and Hartigan model, which can be used to estimate parameters by maximum likelihood (ML) when the data contain invariant sites and there are violations of the assumptions of stationarity, reversibility, and homogeneity. Unlike most ML methods for estimating invariant sites, we estimate the nucleotide composition of invariant sites separately from that of variable sites. We analyze a bacterial data set where problems due to lack of stationarity and homogeneity have been previously well noted and use the parametric bootstrap to show that the data are consistent with our general Markov model. We also show that estimates of invariant sites obtained using our method are fairly accurate when applied to data simulated under the general Markov model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacillus subtilis / classification
  • Bacillus subtilis / genetics
  • Bacteria / classification
  • Bacteria / genetics*
  • Base Sequence
  • Deinococcus / classification
  • Deinococcus / genetics
  • Evolution, Molecular*
  • Likelihood Functions
  • Markov Chains
  • Models, Genetic*
  • Phylogeny*
  • RNA, Ribosomal, 16S / chemistry
  • Sequence Analysis, DNA
  • Thermotoga maritima / classification
  • Thermotoga maritima / genetics
  • Thermus thermophilus / classification
  • Thermus thermophilus / genetics

Substances

  • RNA, Ribosomal, 16S