Regression with Highly Correlated Predictors: Variable Omission Is Not the Solution

Int J Environ Res Public Health. 2021 Apr 17;18(8):4259. doi: 10.3390/ijerph18084259.

Abstract

Regression models have been in use for decades to explore and quantify the association between a dependent response and several independent variables in environmental sciences, epidemiology and public health. However, researchers often encounter situations in which some independent variables exhibit high bivariate correlation, or may even be collinear. Improper statistical handling of this situation will most certainly generate models of little or no practical use and misleading interpretations. By means of two example studies, we demonstrate how diagnostic tools for collinearity or near-collinearity may fail in guiding the analyst. Instead, the most appropriate way of handling collinearity should be driven by the research question at hand and, in particular, by the distinction between predictive or explanatory aims.

Keywords: collinearity; correlated predictors; exposure-response association; multivariable modelling; nonlinear effects.

Publication types

  • Research Support, Non-U.S. Gov't