Accurate Imputation of Untyped Variants from Deep Sequencing Data

Methods Mol Biol. 2021:2243:271-281. doi: 10.1007/978-1-0716-1103-6_13.

Abstract

The quality, statistical power, and resolution of genome-wide association studies (GWAS) are largely dependent on the comprehensiveness of genotypic data. Over the last few years, despite the constant decrease in the price of sequencing, whole-genome sequencing (WGS) of association panels comprising a large number of samples remains cost-prohibitive. Therefore, most GWAS populations are still genotyped using low-coverage genotyping methods resulting in incomplete datasets. Imputation of untyped variants is a powerful method to maximize the number of SNPs identified in study samples, it increases the power and resolution of GWAS and allows to integrate genotyping datasets obtained from various sources. Here, we describe the key concepts underlying imputation of untyped variants, including the architecture of reference panels, and review some of the associated challenges and how these can be addressed. We also discuss the need and available methods to rigorously assess the accuracy of imputed data prior to their use in any genetic study.

Keywords: Deep sequencing; GWAS; Genotype imputation; Genotyping; Imputation; Imputation accuracy; NGS data analysis; Reference panel; Untyped variants.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Genome-Wide Association Study / methods
  • Genotype
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Polymorphism, Single Nucleotide / genetics*
  • Whole Genome Sequencing / methods