Phenomenological Load on Model Parameters Can Lead to False Biological Conclusions

Mol Biol Evol. 2018 Jun 1;35(6):1473-1488. doi: 10.1093/molbev/msy049.

Abstract

When a substitution model is fitted to an alignment using maximum likelihood, its parameters are adjusted to account for as much site-pattern variation as possible. A parameter might therefore absorb a substantial quantity of the total variance in an alignment (or more formally, bring about a substantial reduction in the deviance of the fitted model) even if the process it represents played no role in the generation of the data. When this occurs, we say that the parameter estimate carries phenomenological load (PL). Large PL in a parameter estimate is a concern because it not only invalidates its mechanistic interpretation (if it has one) but also increases the likelihood that it will be found to be statistically significant. The problem of PL was not identified in the past because most off-the-shelf substitution models make simplifying assumptions that preclude the generation of realistic levels of variation. In this study, we use the more realistic mutation-selection framework as the basis of a generating model formulated to produce data that mimic an alignment of mammalian mitochondrial DNA. We show that a parameter estimate can carry PL when 1) the substitution model is underspecified and 2) the parameter represents a process that is confounded with other processes represented in the data-generating model. We then provide a method that can be used to identify signal for the process that a given parameter represents despite the existence of PL.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • DNA, Mitochondrial
  • Evolution, Molecular
  • Likelihood Functions
  • Mammals / genetics*
  • Models, Genetic*
  • Mutation*
  • Selection, Genetic*
  • Sequence Alignment
  • Silent Mutation*

Substances

  • DNA, Mitochondrial