Estimation and interpretation problems and solutions when using proportion covariates in linear regression models

Denis Valle; Jeffrey Mintz; Ismael Verrastro Brack

doi:10.1002/ecy.4256

Estimation and interpretation problems and solutions when using proportion covariates in linear regression models

Ecology. 2024 Apr;105(4):e4256. doi: 10.1002/ecy.4256. Epub 2024 Feb 15.

Authors

Denis Valle¹, Jeffrey Mintz², Ismael Verrastro Brack¹

Affiliations

¹ School of Forest, Fisheries, and Geomatics Sciences, University of Florida, Gainesville, Florida, USA.
² School of Natural Resources and Environment, University of Florida, Gainesville, Florida, USA.

PMID: 38361276
DOI: 10.1002/ecy.4256

Abstract

Proportion variables, also known as compositional data, are very common in ecology. Unfortunately, few scientists are aware of how compositional data, when used as covariates, can adversely impact statistical analysis. We describe here how proportion covariates result in multicollinearity and parameter identifiability problems. Using simulated data on bird species richness as a function of land use, we show how these problems manifest when fitting a wide range of models in R, both in a frequentist and Bayesian framework. In particular, we show that similar models can often generate substantially different parameter estimates, leading to very different conclusions. Dropping a covariate or the intercept from the model can solve the multicollinearity and parameter identifiability problems. Unfortunately, these solutions do not fix the inherent challenges associated with interpreting parameter estimates. To this end, we propose focusing the interpretation on the difference of slope parameters to avoid the inherent unidentifiability of individual parameters. We also propose conditional plots with two x-axes and marginal plots as visualization techniques that can help users better interpret their modeling results. We illustrate these problems and proposed solutions using empirical data from the North American Breeding Bird Survey. The practical and straightforward approaches suggested in this article will help the fitting of linear models and interpretation of its results when some of the covariates are proportions.

Keywords: compositional covariates; conditional plot; inference; linear model; marginal plot; multicollinearity; parameter identifiability; parameter interpretation.

MeSH terms

Bayes Theorem
Linear Models
Models, Statistical*

Abstract

MeSH terms

Grants and funding