Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank

Nat Genet. 2023 Jul;55(7):1243-1249. doi: 10.1038/s41588-023-01415-w. Epub 2023 Jun 29.

Abstract

Phasing involves distinguishing the two parentally inherited copies of each chromosome into haplotypes. Here, we introduce SHAPEIT5, a new phasing method that quickly and accurately processes large sequencing datasets and applied it to UK Biobank (UKB) whole-genome and whole-exome sequencing data. We demonstrate that SHAPEIT5 phases rare variants with low switch error rates of below 5% for variants present in just 1 sample out of 100,000. Furthermore, we outline a method for phasing singletons, which, although less precise, constitutes an important step towards future developments. We then demonstrate that the use of UKB as a reference panel improves the accuracy of genotype imputation, which is even more pronounced when phased with SHAPEIT5 compared with other methods. Finally, we screen the UKB data for loss-of-function compound heterozygous events and identify 549 genes where both gene copies are knocked out. These genes complement current knowledge of gene essentiality in the human genome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Specimen Banks*
  • Exome Sequencing
  • Genome, Human* / genetics
  • Genotype
  • Haplotypes
  • Humans
  • Polymorphism, Single Nucleotide / genetics
  • Sequence Analysis, DNA / methods
  • United Kingdom