MegaPath: sensitive and rapid pathogen detection using metagenomic NGS data

BMC Genomics. 2020 Dec 21;21(Suppl 6):500. doi: 10.1186/s12864-020-06875-6.

Abstract

Background: Next-generation sequencing (NGS) enables unbiased detection of pathogens by mapping the sequencing reads of a patient sample to the known reference sequence of bacteria and viruses. However, for a new pathogen without a reference sequence of a close relative, or with a high load of mutations compared to its predecessors, read mapping fails due to a low similarity between the pathogen and reference sequence, which in turn leads to insensitive and inaccurate pathogen detection outcomes.

Results: We developed MegaPath, which runs fast and provides high sensitivity in detecting new pathogens. In MegaPath, we have implemented and tested a combination of polishing techniques to remove non-informative human reads and spurious alignments. MegaPath applies a global optimization to the read alignments and reassigns the reads incorrectly aligned to multiple species to a unique species. The reassignment not only significantly increased the number of reads aligned to distant pathogens, but also significantly reduced incorrect alignments. MegaPath implements an enhanced maximum-exact-match prefix seeding strategy and a SIMD-accelerated Smith-Waterman algorithm to run fast.

Conclusions: In our benchmarks, MegaPath demonstrated superior sensitivity by detecting eight times more reads from a low-similarity pathogen than other tools. Meanwhile, MegaPath ran much faster than the other state-of-the-art alignment-based pathogen detection tools (and compariable with the less sensitivity profile-based pathogen detection tools). The running time of MegaPath is about 20 min on a typical 1 Gb dataset.

Keywords: Abundance detection; Next generation sequencing; Pathogen detection; Read alignment; Shotgun metagenomic sequencing.

MeSH terms

  • Algorithms
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Metagenome
  • Metagenomics*
  • Sequence Alignment
  • Sequence Analysis, DNA
  • Software*