Identification of pathogen genomic variants through an integrated pipeline

BMC Bioinformatics. 2014 Mar 3:15:63. doi: 10.1186/1471-2105-15-63.

Abstract

Background: Whole-genome sequencing represents a powerful experimental tool for pathogen research. We present methods for the analysis of small eukaryotic genomes, including a streamlined system (called Platypus) for finding single nucleotide and copy number variants as well as recombination events.

Results: We have validated our pipeline using four sets of Plasmodium falciparum drug resistant data containing 26 clones from 3D7 and Dd2 background strains, identifying an average of 11 single nucleotide variants per clone. We also identify 8 copy number variants with contributions to resistance, and report for the first time that all analyzed amplification events are in tandem.

Conclusions: The Platypus pipeline provides malaria researchers with a powerful tool to analyze short read sequencing data. It provides an accurate way to detect SNVs using known software packages, and a novel methodology for detection of CNVs, though it does not currently support detection of small indels. We have validated that the pipeline detects known SNVs in a variety of samples while filtering out spurious data. We bundle the methods into a freely available package.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Antimalarials / pharmacology
  • DNA Copy Number Variations / genetics*
  • DNA, Protozoan / genetics
  • Drug Resistance / genetics
  • Genome, Protozoan / genetics*
  • Genomics / methods*
  • Plasmodium falciparum / drug effects
  • Plasmodium falciparum / genetics*
  • Polymorphism, Single Nucleotide / genetics
  • Sequence Analysis, DNA / methods
  • Software*

Substances

  • Antimalarials
  • DNA, Protozoan