Variability in GWAS analysis: the impact of genotype calling algorithm inconsistencies

Pharmacogenomics J. 2010 Aug;10(4):324-35. doi: 10.1038/tpj.2010.46.

Abstract

The Genome-Wide Association Working Group (GWAWG) is part of a large-scale effort by the MicroArray Quality Consortium (MAQC) to assess the quality of genomic experiments, technologies and analyses for genome-wide association studies (GWASs). One of the aims of the working group is to assess the variability of genotype calls within and between different genotype calling algorithms using data for coronary artery disease from the Wellcome Trust Case Control Consortium (WTCCC) and the University of Ottawa Heart Institute. Our results show that the choice of genotyping algorithm (for example, Bayesian robust linear model with Mahalanobis distance classifier (BRLMM), the corrected robust linear model with maximum-likelihood-based distances (CRLMM) and CHIAMO (developed and implemented by the WTCCC)) can introduce marked variability in the results of downstream case-control association analysis for the Affymetrix 500K array. The amount of discordance between results is influenced by how samples are combined and processed through the respective genotype calling algorithm, indicating that systematic genotype errors due to computational batch effects are propagated to the list of single-nucleotide polymorphisms found to be significantly associated with the trait of interest. Further work using HapMap samples shows that inconsistencies between Affymetrix arrays and calling algorithms can lead to genotyping errors that influence downstream analysis.

MeSH terms

  • Algorithms*
  • Computational Biology / methods
  • Data Interpretation, Statistical
  • Databases, Genetic
  • Genome-Wide Association Study / statistics & numerical data*
  • Genotype*
  • Heart Diseases / genetics
  • Humans
  • Linear Models
  • Oligonucleotide Array Sequence Analysis / standards
  • Polymorphism, Single Nucleotide
  • Quality Control
  • Reference Standards