Benchmarking of viral haplotype reconstruction programmes: an overview of the capacities and limitations of currently available programmes

Brief Bioinform. 2014 May;15(3):431-42. doi: 10.1093/bib/bbs081. Epub 2012 Dec 19.

Abstract

Viral haplotype reconstruction from a set of observed reads is one of the most challenging problems in bioinformatics today. Next-generation sequencing technologies enable us to detect single-nucleotide polymorphisms (SNPs) of haplotypes-even if the haplotypes appear at low frequencies. However, there are two major problems. First, we need to distinguish real SNPs from sequencing errors. Second, we need to determine which SNPs occur on the same haplotype, which cannot be inferred from the reads if the distance between SNPs on a haplotype exceeds the read length. We conducted an independent benchmarking study that directly compares the currently available viral haplotype reconstruction programmes. We also present nine in silico data sets that we generated to reflect biologically plausible populations. For these data sets, we simulated 454 and Illumina reads and applied the programmes to test their capacity to reconstruct whole genomes and individual genes. We developed a novel statistical framework to demonstrate the strengths and limitations of the programmes. Our benchmarking demonstrated that all the programmes we tested performed poorly when sequence divergence was low and failed to recover haplotype populations with rare haplotypes.

Keywords: benchmarking; in silico data sets; quasispecies; statistics for validation; viral haplotype reconstruction programmes.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Animals
  • Computational Biology / methods
  • Computer Simulation
  • Databases, Nucleic Acid / statistics & numerical data
  • Evolution, Molecular
  • Foot-and-Mouth Disease Virus / genetics
  • HIV-1 / genetics
  • Haplotypes*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Polymorphism, Single Nucleotide
  • RNA Viruses / genetics*
  • Software