Log-transformation of Independent Variables: Must We?

Epidemiology. 2022 Nov 1;33(6):843-853. doi: 10.1097/EDE.0000000000001534. Epub 2022 Oct 5.

Abstract

Epidemiologic studies often quantify exposure using biomarkers, which commonly have statistically skewed distributions. Although normality assumption is not required if the biomarker is used as an independent variable in linear regression, it has become common practice to log-transform the biomarker concentrations. This transformation can be motivated by concerns for nonlinear dose-response relationship or outliers; however, such transformation may not always reduce bias. In this study, we evaluated the validity of motivations underlying the decision to log-transform an independent variable using simulations, considering eight scenarios that can give rise to skewed X and normal Y. Our simulation study demonstrates that (1) if the skewness of exposure did not arise from a biasing factor (e.g., measurement error), the analytic approach with the best overall model fit best reflected the underlying outcome generating methods and was least biased, regardless of the skewness of X and (2) all estimates were biased if the skewness of exposure was a consequence of a biasing factor. We additionally illustrate a process to determine whether the transformation of an independent variable is needed using NHANES. Our study and suggestion to divorce the shape of the exposure distribution from the decision to log-transform it may aid researchers in planning for analysis using biomarkers or other skewed independent variables.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bias
  • Biomarkers
  • Computer Simulation
  • Humans
  • Linear Models
  • Nutrition Surveys*

Substances

  • Biomarkers