A legume specific protein database (LegProt) improves the number of identified peptides, confidence scores and overall protein identification success rates for legume proteomics

Phytochemistry. 2011 Jul;72(10):1020-7. doi: 10.1016/j.phytochem.2011.01.026. Epub 2011 Feb 23.

Abstract

A legume specific protein database (LegProt) has been created containing sequences from seven legume species, i.e., Glycine max, Lotus japonicus, Medicago sativa, Medicago truncatula, Lupinusalbus, Phaseolus vulgaris, and Pisum sativum. The database consists of amino acid sequences translated from predicted gene models and 6-frame translations of tentative consensus (TC) sequences assembled from expressed sequence tags (ESTs) and singleton ESTs. This database was queried using mass spectral data for protein identification and identification success rates were compared to the NCBI nr database. Specifically, Mascot MS/MS ion searches of tandem nano-LC Q-TOFMS/MS mass spectral data showed that relative to the NCBI nr protein database, the LegProt database yielded a 54% increase in the average protein score (i.e., from NCBI nr 480 to LegProt 739) and a 50% increase in the average number of matched peptides (i.e., from NCBI nr 8 to LegProt 12). The overall identification success rate also increased from 88% (NCBI nr) to 93% (LegProt). Mascot peptide mass fingerprinting (PMF) searches of the LegProt database using MALDI-TOFMS data yielded a significant increase in the identification success rate from 19% (NCBI nr) to 34% (LegProt) while the average scores and average number of matched peptides showed insignificant changes. The results demonstrate that the LegProt database significantly increases legume protein identification success rates and the confidence levels compared to the commonly used NCBI nr. These improvements are primarily due to the presence of a large number of legume specific TC sequences in the LegProt database that were not found in NCBI nr. The LegProt database is freely available for download (http://bioinfo.noble.org/manuscript-support/legumedb) and will serve as a valuable resource for legume proteomics.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Databases, Protein
  • Peptides / analysis*
  • Plant Proteins / analysis*
  • Proteomics*
  • Tandem Mass Spectrometry

Substances

  • Peptides
  • Plant Proteins