Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction

Nucleic Acids Res. 2021 Sep 27;49(17):e102. doi: 10.1093/nar/gkab576.

Abstract

Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient's treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • COVID-19 / diagnosis
  • COVID-19 / virology
  • Computational Biology / methods*
  • Gene Frequency
  • HIV Infections / diagnosis
  • HIV Infections / virology
  • HIV-1 / genetics
  • Haplotypes*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Mutation
  • Polymorphism, Single Nucleotide
  • RNA Virus Infections / diagnosis*
  • RNA Virus Infections / virology
  • RNA Viruses / genetics*
  • Reproducibility of Results
  • SARS-CoV-2 / genetics
  • Sensitivity and Specificity