Collision between biological process and statistical analysis revealed by mean centring

J Anim Ecol. 2020 Dec;89(12):2813-2824. doi: 10.1111/1365-2656.13360. Epub 2020 Nov 12.

Abstract

Animal ecologists often collect hierarchically structured data and analyse these with linear mixed-effects models. Specific complications arise when the effect sizes of covariates vary on multiple levels (e.g. within vs. among subjects). Mean centring of covariates within subjects offers a useful approach in such situations, but is not without problems. A statistical model represents a hypothesis about the underlying biological process. Mean centring within clusters assumes that the lower level responses (e.g. within subjects) depend on the deviation from the subject mean (relative) rather than on the absolute scale of the covariate. This may or may not be biologically realistic. We show that mismatch between the nature of the generating (i.e. biological) process and the form of the statistical analysis produce major conceptual and operational challenges for empiricists. We explored the consequences of mismatches by simulating data with three response-generating processes differing in the source of correlation between a covariate and the response. These data were then analysed by three different analysis equations. We asked how robustly different analysis equations estimate key parameters of interest and under which circumstances biases arise. Mismatches between generating and analytical equations created several intractable problems for estimating key parameters. The most widely misestimated parameter was the among-subject variance in response. We found that no single analysis equation was robust in estimating all parameters generated by all equations. Importantly, even when response-generating and analysis equations matched mathematically, bias in some parameters arose when sampling across the range of the covariate was limited. Our results have general implications for how we collect and analyse data. They also remind us more generally that conclusions from statistical analysis of data are conditional on a hypothesis, sometimes implicit, for the process(es) that generated the attributes we measure. We discuss strategies for real data analysis in face of uncertainty about the underlying biological process.

Keywords: bivariate models; environmental effects; hierarchical causation; linear mixed-effects models; model design; parameter misestimation; phenotypic plasticity.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Biological Phenomena*
  • Linear Models
  • Models, Statistical*

Associated data

  • Dryad/10.5061/dryad.sj3tx9632