PepLine: a software pipeline for high-throughput direct mapping of tandem mass spectrometry data on genomic sequences

J Proteome Res. 2008 May;7(5):1873-83. doi: 10.1021/pr070415k. Epub 2008 Mar 19.

Abstract

PepLine is a fully automated software which maps MS/MS fragmentation spectra of trypsic peptides to genomic DNA sequences. The approach is based on Peptide Sequence Tags (PSTs) obtained from partial interpretation of QTOF MS/MS spectra (first module). PSTs are then mapped on the six-frame translations of genomic sequences (second module) giving hits. Hits are then clustered to detect potential coding regions (third module). Our work aimed at optimizing the algorithms of each component to allow the whole pipeline to proceed in a fully automated manner using raw nucleic acid sequences (i.e., genomes that have not been "reduced" to a database of ORFs or putative exons sequences). The whole pipeline was tested on controlled MS/MS spectra sets from standard proteins and from Arabidopsis thaliana envelope chloroplast samples. Our results demonstrate that PepLine competed with protein database searching softwares and was fast enough to potentially tackle large data sets and/or high size genomes. We also illustrate the potential of this approach for the detection of the intron/exon structure of genes.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Arabidopsis / chemistry
  • Arabidopsis / cytology
  • Arabidopsis / genetics
  • Arabidopsis Proteins / analysis*
  • Arabidopsis Proteins / genetics
  • Base Sequence
  • Chloroplasts / chemistry
  • Chloroplasts / genetics
  • Genome*
  • Mass Spectrometry* / instrumentation
  • Mass Spectrometry* / methods
  • Molecular Sequence Data
  • Peptides / analysis*
  • Peptides / genetics
  • Sequence Alignment
  • Software*

Substances

  • Arabidopsis Proteins
  • Peptides