Joint genotype calling with array and sequence data

Genet Epidemiol. 2012 Sep;36(6):527-37. doi: 10.1002/gepi.21657. Epub 2012 Jul 20.

Abstract

Analysis of rare variants is currently a major focus of genetic studies of human disease. Single-nucleotide polymorphism (SNP) genotypes can be assayed using microarray genotyping or by sequencing, but neither technology produces perfect genotype calls, especially at rare SNPs. Studies that collect both types of data are becoming increasingly common, so it may be possible to combine data types to increase accuracy. We present a method, called Chiamante, which calls genotypes on individuals with either array data, sequence data, or both. The model adapts to data quality and can estimate when either the array or the sequence data should be ignored when calling the genotypes at each SNP. As a special case, our method will call genotypes from only array data and outperforms existing methods in this scenario. We have applied our method to array and sequence data from Phase I of the 1000 Genomes Project and show that it provides improved performance, especially at rare SNPs. This method provides a foundation for future efforts to fuse genetic data from different sources, for example, when combining data from exome sequencing and exome microarrays.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Genetic Variation
  • Genotype
  • Human Genome Project
  • Humans
  • Models, Genetic*
  • Oligonucleotide Array Sequence Analysis / methods
  • Polymorphism, Single Nucleotide*
  • Rare Diseases
  • Sequence Analysis, DNA / methods*
  • Software*