Nanopanel2 calls phased low-frequency variants in Nanopore panel sequencing data

Bioinformatics. 2021 Dec 11;37(24):4620-4625. doi: 10.1093/bioinformatics/btab526.

Abstract

Motivation: Clinical decision making is increasingly guided by accurate and recurrent determination of presence and frequency of (somatic) variants and their haplotype through panel sequencing of disease-relevant genomic regions. Haplotype calling (phasing), however, is difficult and error prone unless variants are located on the same read which limits the ability of short-read sequencing to detect, e.g. co-occurrence of drug-resistance variants. Long-read panel sequencing enables direct phasing of amplicon variants besides having multiple other benefits, however, high error rates of current technologies prevented their applicability in the past.

Results: We have developed Nanopanel2, a variant caller for Nanopore panel sequencing data. Nanopanel2 works directly on base-called FAST5 files and uses allele probability distributions and several other filters to robustly separate true from false positive (FP) calls. It effectively calls SNVs and INDELs with variant allele frequencies as low as 1% and 5%, respectively, and produces only few low-frequency false-positive calls (∼1 FP call with VAF<5% per kb amplicon). Haplotype compositions are then determined by direct phasing. Nanopanel2 is the first somatic variant caller for Nanopore data, enabling accurate, fast (turnaround <48 h) and cheap (sequencing costs ∼10$/sample) diagnostic workflows.

Availabilityand implementation: The data for this study have been deposited at zenodo.org under DOIs accession numbers 4110691 and 4110698. Nanopanel2 is open source and available at https://github.com/popitsch/nanopanel2.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Genomics
  • High-Throughput Nucleotide Sequencing
  • Nanopores*
  • Sequence Analysis, DNA
  • Software*