Statistical measures of uncertainty for branches in phylogenetic trees inferred from molecular sequences by using model-based methods

J Appl Genet. 2008;49(1):49-67. doi: 10.1007/BF03195249.

Abstract

In recent years, the emphasis of theoretical work on phylogenetic inference has shifted from the development of new tree inference methods to the development of methods to measure the statistical support for the topologies. This paper reviews 3 approaches to assign support values to branches in trees obtained in the analysis of molecular sequences: the bootstrap, the Bayesian posterior probabilities for clades, and the interior branch tests. In some circumstances, these methods give different answers. It should not be surprising: their assumptions are different. Thus the interior branch tests assume that a given topology is true and only consider if a particular branch length is longer than zero. If a tree is incorrect, a wrong branch (a low bootstrap or Bayesian support may be an indication) may have a non-zero length. If the substitution model is oversimplified, the length of a branch may be overestimated, and the Bayesian support for the branch may be inflated. The bootstrap, on the other hand, approximates the variance of the data under the real model of sequence evolution, because it involves direct resampling from this data. Thus the discrepancy between the Bayesian support and the bootstrap support may signal model inaccuracy. In practical application, use of all 3 methods is recommended, and if discrepancies are observed, then a careful analysis of their potential origins should be made.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Animals
  • Bayes Theorem
  • Computers, Molecular / statistics & numerical data
  • Computers, Molecular / trends
  • Humans
  • Models, Genetic*
  • Phylogeny*
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / statistics & numerical data*
  • Sequence Analysis, DNA / trends
  • Sequence Analysis, Protein / methods*
  • Sequence Analysis, Protein / statistics & numerical data*
  • Sequence Analysis, Protein / trends
  • Uncertainty*