LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants

Bioinformatics. 2022 Mar 28;38(7):1816-1822. doi: 10.1093/bioinformatics/btac058.

Abstract

Motivation: Long-read phasing has been used for reconstructing diploid genomes, improving variant calling and resolving microbial strains in metagenomics. However, the phasing blocks of existing methods are broken by large Structural Variations (SVs), and the efficiency is unsatisfactory for population-scale phasing.

Results: This article presents a novel algorithm, LongPhase, which can simultaneously phase single nucleotide polymorphisms (SNPs) and SVs of a human genome in 10-20 min, 10× faster than the state-of-the-art WhatsHap, HapCUT2 and Margin. In particular, co-phasing SNPs and SVs produces much larger haplotype blocks (N50 = 25 Mbp) than those of existing methods (N50 = 10-15 Mbp). We show that LongPhase combined with Nanopore ultra-long reads is a cost-effective and highly contiguous solution, which can produce between one and 26 blocks per chromosome arm without the need for additional trios, chromosome-conformation and strand-seq data.

Availabilityand implementation: LongPhase is freely available at https://github.com/twolinin/LongPhase/.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Chromosomes / genetics
  • Genome, Human
  • Haplotypes
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Sequence Analysis, DNA