Undersampling Genomes has Biased Time and Rate Estimates Throughout the Tree of Life

Mol Biol Evol. 2018 Aug 1;35(8):2077-2084. doi: 10.1093/molbev/msy103.

Abstract

Genomic data drive evolutionary research on the relationships and timescale of life but the genomes of most species remain poorly sampled. Phylogenetic trees can be reconstructed reliably using small data sets and the same has been assumed for the estimation of divergence time with molecular clocks. However, we show here that undersampling of molecular data results in a bias expressed as disproportionately shorter branch lengths and underestimated divergence times in the youngest nodes and branches, termed the small sample artifact. In turn, this leads to increasing speciation and diversification rates towards the present. Any evolutionary analyses derived from these biased branch lengths and speciation rates will be similarly biased. The widely used timetrees of the major species-rich studies of amphibians, birds, mammals, and squamate reptiles are all data-poor and show upswings in diversification rate, suggesting that their results were biased by undersampling. Our results show that greater sampling of genomes is needed for accurate time and rate estimation, which are basic data used in ecological and evolutionary research.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Review

MeSH terms

  • Animals
  • Biological Evolution*
  • Genetic Techniques*
  • Genome*
  • Time Factors