Effect of genome-wide simultaneous hypotheses tests on the discovery rate

Int J Mol Epidemiol Genet. 2011;2(2):163-77. Epub 2011 May 5.

Abstract

An increasing number of genome-wide association studies are being performed in hundreds of thousands of single nucleotide polymorphisms (SNPs). Many of such studies carry on a second stage in which a selected number of SNPs are genotyped in new individuals in order to validate genome-wide findings. Unfortunately, a large proportion of such studies have been unable to validate the genome-wide findings. In this study we aim to better understand how to distinguish the truly associated features from the false positives in genome-wide scans. In order to achieve this goal we use empirical data to look at three aspects that may play a key role in determining which features are called to be associated with the phenotype. First, we examine the usual assumption of a uniform distribution on null p-values and assess whether or not it affects which features are called significant and the number of significant features. Second, we compare the global behavior of the p-value distribution genome-wide with the local behavior at regions such as chromosomes. Third, we look at the effect of minor allele frequency in the p-value distribution. We show empirically that the uniform distribution is not a generally valid assumption and we find that as a consequence strikingly different conclusions can be drawn regarding what we call significant associations and the number of significant findings. We propose that in order to better assign significance to potential associations one needs to estimate the true distribution of null and non-null p-values.

Keywords: Genome-wide association study (GWAS); p-value distribution; single nucleotide (SNPs).