Efficient biased estimation of evolutionary distances when substitution rates vary across sites

Mol Biol Evol. 2002 Apr;19(4):534-43. doi: 10.1093/oxfordjournals.molbev.a004109.

Abstract

This paper deals with phylogenetic inference when the variability of substitution rates across sites (VRAS) is modeled by a gamma distribution. We show that underestimating VRAS, which results in underestimates for the evolutionary distances between sequences, usually improves the topological accuracy of phylogenetic tree inference by distance-based methods, especially when the molecular clock holds. We propose a method to estimate the gamma shape parameter value which is most suited for tree topology inference, given the sequences at hand. This method is based on the pairwise evolutionary distances between sequences and allows one to reconstruct the phylogeny of a high number of taxa (>1,000). Simulation results show that the topological accuracy is highly improved when using the gamma shape parameter value given by our method, compared with the true (unknown) value which was used to generate the data. Furthermore, when VRAS is high, the topological accuracy of our distance-based method is better than that of a maximum likelihood approach. Finally, a data set of Maoricicada species sequences is analyzed, which confirms the advantage of our method.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms
  • Amino Acid Substitution / genetics*
  • Animals
  • Computer Simulation
  • Evolution, Molecular*
  • Genetic Variation / genetics*
  • Hemiptera / genetics*
  • Insect Proteins / genetics
  • Mathematics
  • Models, Genetic
  • Mutagenesis / genetics*
  • Phylogeny*
  • Probability

Substances

  • Insect Proteins