Inference after variable selection using restricted permutation methods

Can J Stat. 2009 Dec 1;37(4):625-644. doi: 10.1002/cjs.10039.

Abstract

When confronted with multiple covariates and a response variable, analysts sometimes apply a variable-selection algorithm to the covariate-response data to identify a subset of covariates potentially associated with the response, and then wish to make inferences about parameters in a model for the marginal association between the selected covariates and the response. If an independent data set were available, the parameters of interest could be estimated by using standard inference methods to fit the postulated marginal model to the independent data set. However, when applied to the same data set used by the variable selector, standard ("naive") methods can lead to distorted inferences. The authors develop testing and interval estimation methods for parameters reflecting the marginal association between the selected covariates and response variable, based on the same data set used for variable selection. They provide theoretical justification for the proposed methods, present results to guide their implementation, and use simulations to assess and compare their performance to a sample-splitting approach. The methods are illustrated with data from a recent AIDS study.