Rapid, Reference-Free human genotype imputation with denoising autoencoders

Elife. 2022 Sep 23:11:e75600. doi: 10.7554/eLife.75600.

Abstract

Genotype imputation is a foundational tool for population genetics. Standard statistical imputation approaches rely on the co-location of large whole-genome sequencing-based reference panels, powerful computing environments, and potentially sensitive genetic study data. This results in computational resource and privacy-risk barriers to access to cutting-edge imputation techniques. Moreover, the accuracy of current statistical approaches is known to degrade in regions of low and complex linkage disequilibrium. Artificial neural network-based imputation approaches may overcome these limitations by encoding complex genotype relationships in easily portable inference models. Here, we demonstrate an autoencoder-based approach for genotype imputation, using a large, commonly used reference panel, and spanning the entirety of human chromosome 22. Our autoencoder-based genotype imputation strategy achieved superior imputation accuracy across the allele-frequency spectrum and across genomes of diverse ancestry, while delivering at least fourfold faster inference run time relative to standard imputation tools.

Keywords: artifitial intelligence; autoencoder; computational biology; deep learning; genetics; genomics; human; imputation; population genetics; systems biology.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Gene Frequency
  • Genetics, Population*
  • Genome-Wide Association Study / methods
  • Genotype
  • Humans
  • Linkage Disequilibrium
  • Polymorphism, Single Nucleotide*

Associated data

  • dbGaP/phs001416.v2.p1
  • dbGaP/phs001211.v4.p3