Rapid genotype imputation from sequence without reference panels

Nat Genet. 2016 Aug;48(8):965-969. doi: 10.1038/ng.3594. Epub 2016 Jul 4.

Abstract

Inexpensive genotyping methods are essential for genetic studies requiring large sample sizes. In human studies, array-based microarrays and high-density haplotype reference panels allow efficient genotype imputation for this purpose. However, these resources are typically unavailable in non-human settings. Here we describe a method (STITCH) for imputation based only on sequencing read data, without requiring additional reference panels or array data. We demonstrate its applicability even in settings of extremely low sequencing coverage, by accurately imputing 5.7 million SNPs at a mean r(2) value of 0.98 in 2,073 outbred laboratory mice (0.15× sequencing coverage). In a sample of 11,670 Han Chinese (1.7× coverage), we achieve accuracy similar to that of alternative approaches that require a reference panel, demonstrating that our approach can work for genetically diverse populations. Our method enables straightforward progression from low-coverage sequence to imputed genotypes, overcoming barriers that at present restrict the application of genome-wide association study technology outside humans.

MeSH terms

  • Algorithms*
  • Animals
  • Animals, Outbred Strains / genetics*
  • Asian People / genetics*
  • Computational Biology / methods*
  • Genetics, Population
  • Genotype
  • Haplotypes / genetics*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Mice
  • Polymorphism, Single Nucleotide / genetics
  • Sequence Analysis, DNA / methods*