Five myths about variable selection

Transpl Int. 2017 Jan;30(1):6-10. doi: 10.1111/tri.12895.

Abstract

Multivariable regression models are often used in transplantation research to identify or to confirm baseline variables which have an independent association, causally or only evidenced by statistical correlation, with transplantation outcome. Although sound theory is lacking, variable selection is a popular statistical method which seemingly reduces the complexity of such models. However, in fact, variable selection often complicates analysis as it invalidates common tools of statistical inference such as P-values and confidence intervals. This is a particular problem in transplantation research where sample sizes are often only small to moderate. Furthermore, variable selection requires computer-intensive stability investigations and a particularly cautious interpretation of results. We discuss how five common misconceptions often lead to inappropriate application of variable selection. We emphasize that variable selection and all problems related with it can often be avoided by the use of expert knowledge.

Keywords: association; explanatory models; multivariable modeling; prediction; statistical analysis.

Publication types

  • Review

MeSH terms

  • Computers
  • Data Interpretation, Statistical
  • Humans
  • Models, Statistical
  • Multivariate Analysis
  • Regression Analysis*
  • Research Design*
  • Sample Size
  • Software
  • Transplantation / methods*