Vector sequence contamination of the Plasmodium vivax sequence database in PlasmoDB and In silico correction of 26 parasite sequences

Parasit Vectors. 2015 Jun 12:8:318. doi: 10.1186/s13071-015-0927-x.

Abstract

We found a 47 aa protein sequence that occurs 17 times in the Plasmodium vivax nucleotide database published on PlasmoDB. Coding sequence analysis showed multiple restriction enzyme sites within the 141 bp nucleotide sequence, and a His6 tag attached to the 3' end, suggesting cloning vector origins. Sequences with vector contamination were submitted to NCBI, and BLASTN was used to cross-examine whole-genome shotgun contigs (WGS) from four recently deposited P. vivax whole genome sequencing projects. There are at least 26 genes listed in the PlasmoDB database that incorporate this cloning vector sequence into their predicted provisional protein products.

Publication types

  • Letter
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Computer Simulation
  • Databases, Nucleic Acid*
  • Genetic Vectors / genetics*
  • Genome, Protozoan
  • Molecular Sequence Data
  • Plasmodium vivax / classification
  • Plasmodium vivax / genetics*