Reconsidering association testing methods using single-variant test statistics as alternatives to pooling tests for sequence data with rare variants

PLoS One. 2012;7(2):e30238. doi: 10.1371/journal.pone.0030238. Epub 2012 Feb 17.

Abstract

Association tests that pool minor alleles into a measure of burden at a locus have been proposed for case-control studies using sequence data containing rare variants. However, such pooling tests are not robust to the inclusion of neutral and protective variants, which can mask the association signal from risk variants. Early studies proposing pooling tests dismissed methods for locus-wide inference using nonnegative single-variant test statistics based on unrealistic comparisons. However, such methods are robust to the inclusion of neutral and protective variants and therefore may be more useful than previously appreciated. In fact, some recently proposed methods derived within different frameworks are equivalent to performing inference on weighted sums of squared single-variant score statistics. In this study, we compared two existing methods for locus-wide inference using nonnegative single-variant test statistics to two widely cited pooling tests under more realistic conditions. We established analytic results for a simple model with one rare risk and one rare neutral variant, which demonstrated that pooling tests were less powerful than even Bonferroni-corrected single-variant tests in most realistic situations. We also performed simulations using variants with realistic minor allele frequency and linkage disequilibrium spectra, disease models with multiple rare risk variants and extensive neutral variation, and varying rates of missing genotypes. In all scenarios considered, existing methods using nonnegative single-variant test statistics had power comparable to or greater than two widely cited pooling tests. Moreover, in disease models with only rare risk variants, an existing method based on the maximum single-variant Cochran-Armitage trend chi-square statistic in the locus had power comparable to or greater than another existing method closely related to some recently proposed methods. We conclude that efficient locus-wide inference using single-variant test statistics should be reconsidered as a useful framework for devising powerful association tests in sequence data with rare variants.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Base Sequence
  • Computer Simulation
  • Databases, Nucleic Acid*
  • Gene Frequency / genetics
  • Genetic Association Studies / methods*
  • Genetic Predisposition to Disease
  • Genetic Variation*
  • Humans
  • Linkage Disequilibrium / genetics
  • Models, Genetic
  • Models, Statistical*
  • Monte Carlo Method
  • Risk Factors