Denoising peptide tandem mass spectra for spectral libraries: a Bayesian approach

J Proteome Res. 2013 Jul 5;12(7):3223-32. doi: 10.1021/pr400080b. Epub 2013 Jun 6.

Abstract

With the rapid accumulation of data from shotgun proteomics experiments, it has become feasible to build comprehensive and high-quality spectral libraries of tandem mass spectra of peptides. A spectral library condenses experimental data into a retrievable format and can be used to aid peptide identification by spectral library searching. A key step in spectral library building is spectrum denoising, which is best accomplished by merging multiple replicates of the same peptide ion into a consensus spectrum. However, this approach cannot be applied to "singleton spectra," for which only one observed spectrum is available for the peptide ion. We developed a method, based on a Bayesian classifier, for denoising peptide tandem mass spectra. The classifier accounts for relationships between peaks, and can be trained on the fly from consensus spectra and immediately applied to denoise singleton spectra, without hard-coded knowledge about peptide fragmentation. A linear regression model was also trained to predict the number of useful "signal" peaks in a spectrum, thereby obviating the need for arbitrary thresholds for peak filtering. This Bayesian approach accumulates weak evidence systematically to boost the discrimination power between signal and noise peaks, and produces readily interpretable conditional probabilities that offer valuable insights into peptide fragmentation behaviors. By cross validation, spectra denoised by this method were shown to retain more signal peaks, and have higher spectral similarities to replicates, than those filtered by intensity only.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem*
  • Databases, Protein
  • Peptide Fragments / chemistry
  • Peptide Fragments / isolation & purification
  • Peptides / chemistry
  • Peptides / isolation & purification*
  • Proteomics / methods*
  • Signal-To-Noise Ratio
  • Software
  • Tandem Mass Spectrometry / methods*

Substances

  • Peptide Fragments
  • Peptides