Open MS/MS spectral library search to identify unanticipated post-translational modifications and increase spectral identification rate

Bioinformatics. 2010 Jun 15;26(12):i399-406. doi: 10.1093/bioinformatics/btq185.

Abstract

Motivation: Identification of post-translationally modified proteins has become one of the central issues of current proteomics. Spectral library search is a new and promising computational approach to mass spectrometry-based protein identification. However, its potential in identification of unanticipated post-translational modifications has rarely been explored. The existing spectral library search tools are designed to match the query spectrum to the reference library spectra with the same peptide mass. Thus, spectra of peptides with unanticipated modifications cannot be identified.

Results: In this article, we present an open spectral library search tool, named pMatch. It extends the existing library search algorithms in at least three aspects to support the identification of unanticipated modifications. First, the spectra in library are optimized with the full peptide sequence information to better tolerate the peptide fragmentation pattern variations caused by some modification(s). Second, a new scoring system is devised, which uses charge-dependent mass shifts for peak matching and combines a probability-based model with the general spectral dot-product for scoring. Third, a target-decoy strategy is used for false discovery rate control. To demonstrate the effectiveness of pMatch, a library search experiment was conducted on a public dataset with over 40,000 spectra in comparison with SpectraST, the most popular library search engine. Additional validations were done on four published datasets including over 150,000 spectra. The results showed that pMatch can effectively identify unanticipated modifications and significantly increase spectral identification rate.

Availability: http://pfind.ict.ac.cn/pmatch/.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Protein
  • Mass Spectrometry / methods*
  • Protein Processing, Post-Translational*
  • Proteins / chemistry*
  • Proteomics / methods*

Substances

  • Proteins