Quality control for genome-wide association studies

Methods Mol Biol. 2013:1019:129-47. doi: 10.1007/978-1-62703-447-0_5.

Abstract

This chapter overviews the quality control (QC) issues for SNP-based genotyping methods used in genome-wide association studies. The main metrics for evaluating the quality of the genotypes are discussed followed by a worked out example of QC pipeline starting with raw data and finishing with a fully filtered dataset ready for downstream analysis. The emphasis is on automation of data storage, filtering, and manipulation to ensure data integrity throughput the process and on how to extract a global summary from these high dimensional datasets to allow better-informed downstream analytical decisions. All examples will be run using the R statistical programming language followed by a practical example using a fully automated QC pipeline for the Illumina platform.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Algorithms
  • Animals
  • Genome-Wide Association Study*
  • Genotype
  • Humans
  • Information Storage and Retrieval
  • Polymorphism, Single Nucleotide*
  • Programming Languages*
  • Quality Control