Direct effects testing: a two-stage procedure to test for effect size and variable importance for correlated binary predictors and a binary response

Stat Med. 2010 Oct 30;29(24):2544-56. doi: 10.1002/sim.4014.

Abstract

In applications such as medical statistics and genetics, we encounter situations where a large number of highly correlated predictors explain a response. For example, the response may be a disease indicator and the predictors may be treatment indicators or single nucleotide polymorphisms (SNPs). Constructing a good predictive model in such cases is well studied. Less well understood is how to recover the 'true sparsity pattern', that is finding which predictors have direct effects on the response, and indicating the statistical significance of the results. Restricting attention to binary predictors and response, we study the recovery of the true sparsity pattern using a two-stage method that separates establishing the presence of effects from inferring their exact relationship with the predictors. Simulations and a real data application demonstrate that the method discriminates well between associations and direct effects. Comparisons with lasso-based methods demonstrate favourable performance of the proposed method.

MeSH terms

  • Age of Onset
  • Alcohol Drinking / epidemiology
  • Comorbidity
  • Coronary Disease / epidemiology
  • Data Interpretation, Statistical*
  • Genome-Wide Association Study / methods
  • Humans
  • Models, Statistical*
  • Obesity / epidemiology
  • Regression Analysis
  • Risk Factors
  • Rural Health / statistics & numerical data
  • Smoking / epidemiology
  • South Africa / epidemiology