Correction of the significance level when attempting multiple transformations of an explanatory variable in generalized linear models

BMC Med Res Methodol. 2013 Jun 8:13:75. doi: 10.1186/1471-2288-13-75.

Abstract

Background: In statistical modeling, finding the most favorable coding for an exploratory quantitative variable involves many tests. This process involves multiple testing problems and requires the correction of the significance level.

Methods: For each coding, a test on the nullity of the coefficient associated with the new coded variable is computed. The selected coding corresponds to that associated with the largest statistical test (or equivalently the smallest pvalue). In the context of the Generalized Linear Model, Liquet and Commenges (Stat Probability Lett,71:33-38,2005) proposed an asymptotic correction of the significance level. This procedure, based on the score test, has been developed for dichotomous and Box-Cox transformations. In this paper, we suggest the use of resampling methods to estimate the significance level for categorical transformations with more than two levels and, by definition those that involve more than one parameter in the model. The categorical transformation is a more flexible way to explore the unknown shape of the effect between an explanatory and a dependent variable.

Results: The simulations we ran in this study showed good performances of the proposed methods. These methods were illustrated using the data from a study of the relationship between cholesterol and dementia.

Conclusion: The algorithms were implemented using R, and the associated CPMCGLM R package is available on the CRAN.

MeSH terms

  • Aged
  • Algorithms
  • Cholesterol, HDL / blood
  • Computer Simulation*
  • Data Interpretation, Statistical
  • Dementia / blood
  • Epidemiologic Factors
  • Epidemiologic Research Design*
  • Humans
  • Linear Models*
  • Multivariate Analysis
  • Reproducibility of Results
  • Risk Factors
  • Sample Size

Substances

  • Cholesterol, HDL