SNP characteristics predict replication success in association studies

Hum Genet. 2014 Dec;133(12):1477-86. doi: 10.1007/s00439-014-1493-6. Epub 2014 Oct 2.

Abstract

Successful independent replication is the most direct approach for distinguishing real genotype-disease associations from false discoveries in genome-wide association studies (GWAS). Selecting SNPs for replication has been primarily based on P values from the discovery stage, although additional characteristics of SNPs may be used to improve replication success. We used disease-associated SNPs from more than 2,000 published GWASs to identify predictors of SNP reproducibility. SNP reproducibility was defined as a proportion of successful replications among all replication attempts. The study reporting association for the first time was considered to be discovery and all consequent studies targeting the same phenotype replications. We found that -Log(P), where P is a P value from the discovery study, is the strongest predictor of the SNP reproducibility. Other significant predictors include type of the SNP (e.g., missense vs intronic SNPs) and minor allele frequency. Features of the genes linked to the disease-associated SNP also predict SNP reproducibility. Based on empirically defined rules, we developed a reproducibility score (RS) to predict SNP reproducibility independently of -Log(P). We used data from two lung cancer GWAS studies as well as recently reported disease-associated SNPs to validate RS. Minus Log(P) outperforms RS when the very top SNPs are selected, while RS works better with relaxed selection criteria. In conclusion, we propose an empirical model to predict SNP reproducibility, which can be used to select SNPs for validation and prioritization.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Gene Frequency
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study*
  • Genotype
  • Humans
  • Lung Neoplasms / genetics
  • Open Reading Frames
  • Polymorphism, Single Nucleotide*
  • Reproducibility of Results