Modeling Pulsed Evolution and Time-Independent Variation Improves the Confidence Level of Ancestral and Hidden State Predictions

Syst Biol. 2022 Aug 10;71(5):1225-1232. doi: 10.1093/sysbio/syac016.

Abstract

Ancestral state reconstruction is not only a fundamental tool for studying trait evolution, but also very useful for predicting the unknown trait values (hidden states) of extant species. A well-known problem in ancestral and hidden state predictions is that the uncertainty associated with predictions can be so large that predictions themselves are of little use. Therefore, for meaningful interpretation of predicted traits and hypothesis testing, it is prudent to accurately assess the uncertainty of the predictions. Commonly used constant-rate Brownian motion (BM) model fails to capture the complexity of tempo and mode of trait evolution in nature, making predictions under the BM model vulnerable to lack-of-fit errors from model misspecification. Using empirical data (mammalian body size and bacterial genome size), we show that the distribution of residual Z-scores under the BM model is neither homoscedastic nor normal as expected. Consequently, the 95% confidence intervals of predicted traits are so unreliable that the actual coverage probability ranges from 33% (strongly permissive) to 100% (strongly conservative). Alternative methods such as BayesTraits and StableTraits that allow variable rates in evolution improve the predictions but are computationally expensive. Here, we develop Reconstructing Ancestral State under Pulsed Evolution in R by Gaussian Decomposition (RasperGade), a method of ancestral and hidden state prediction that uses the Levy process to explicitly model gradual evolution, pulsed evolution, and time-independent variation. Using the same empirical data, we show that RasperGade outperforms both BayesTraits and StableTraits in providing reliable confidence estimates and is orders-of-magnitude faster. Our results suggest that, when predicting the ancestral and hidden states of continuous traits, the rate variation should always be assessed and the quality of confidence estimates should always be examined. [Bacterial genomic traits; model misspecification; trait evolution.].

MeSH terms

  • Animals
  • Body Size
  • Mammals*
  • Phenotype
  • Phylogeny
  • Time