MegaPath: sensitive and rapid pathogen detection using metagenomic NGS data

Chi-Ming Leung; Dinghua Li; Yan Xin; Wai-Chun Law; Yifan Zhang; Hing-Fung Ting; Ruibang Luo; Tak-Wah Lam

doi:10.1186/s12864-020-06875-6

MegaPath: sensitive and rapid pathogen detection using metagenomic NGS data

BMC Genomics. 2020 Dec 21;21(Suppl 6):500. doi: 10.1186/s12864-020-06875-6.

Authors

Chi-Ming Leung^{1

2}, Dinghua Li³, Yan Xin^{3

4}, Wai-Chun Law⁴, Yifan Zhang^{3

4}, Hing-Fung Ting³, Ruibang Luo^{3

4}, Tak-Wah Lam^{3

4}

Affiliations

¹ Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong. cmleung2@cs.hku.hk.
² L3 Bioinformatics Limited, Rm 2114, Hong Kong Plaza, 188 Connaught Road West, Sai Ying Pun, Hong Kong. cmleung2@cs.hku.hk.
³ Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong.
⁴ L3 Bioinformatics Limited, Rm 2114, Hong Kong Plaza, 188 Connaught Road West, Sai Ying Pun, Hong Kong.

Abstract

Background: Next-generation sequencing (NGS) enables unbiased detection of pathogens by mapping the sequencing reads of a patient sample to the known reference sequence of bacteria and viruses. However, for a new pathogen without a reference sequence of a close relative, or with a high load of mutations compared to its predecessors, read mapping fails due to a low similarity between the pathogen and reference sequence, which in turn leads to insensitive and inaccurate pathogen detection outcomes.

Results: We developed MegaPath, which runs fast and provides high sensitivity in detecting new pathogens. In MegaPath, we have implemented and tested a combination of polishing techniques to remove non-informative human reads and spurious alignments. MegaPath applies a global optimization to the read alignments and reassigns the reads incorrectly aligned to multiple species to a unique species. The reassignment not only significantly increased the number of reads aligned to distant pathogens, but also significantly reduced incorrect alignments. MegaPath implements an enhanced maximum-exact-match prefix seeding strategy and a SIMD-accelerated Smith-Waterman algorithm to run fast.

Conclusions: In our benchmarks, MegaPath demonstrated superior sensitivity by detecting eight times more reads from a low-similarity pathogen than other tools. Meanwhile, MegaPath ran much faster than the other state-of-the-art alignment-based pathogen detection tools (and compariable with the less sensitivity profile-based pathogen detection tools). The running time of MegaPath is about 20 min on a typical 1 Gb dataset.

Keywords: Abundance detection; Next generation sequencing; Pathogen detection; Read alignment; Shotgun metagenomic sequencing.

MeSH terms

Algorithms
High-Throughput Nucleotide Sequencing
Humans
Metagenome
Metagenomics*
Sequence Alignment
Sequence Analysis, DNA
Software*

Abstract

MeSH terms

Grants and funding