Methods to Calculate Spectrum Similarity

Methods Mol Biol. 2017:1549:75-100. doi: 10.1007/978-1-4939-6740-7_7.

Abstract

Scoring functions that assess spectrum similarity play a crucial role in many computational mass spectrometry algorithms. These functions are used to compare an experimentally acquired fragmentation (MS/MS) spectrum against two different types of target MS/MS spectra: either against a theoretical MS/MS spectrum derived from a peptide from a sequence database, or against another, previously acquired MS/MS spectrum. The former is typically encountered in database searching, while the latter is used in spectrum clustering and spectral library searching. The comparison between acquired versus theoretical MS/MS spectra is most commonly performed using cross-correlations or probability derived scoring functions, while the comparison of two acquired MS/MS spectra typically makes use of a normalized dot product, especially in spectrum library search algorithms. In addition to these scoring functions, Pearson's or Spearman's correlation coefficients, mean squared error, or median absolute deviation scores can also be used for the same purpose. Here, we describe and evaluate these scoring functions with regards to their ability to assess spectrum similarity for theoretical versus acquired, and acquired versus acquired spectra.

Keywords: Database searching; Mass spectrometry; Scoring functions; Spectrum library; Spectrum similarity.

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Databases, Protein
  • Proteome
  • Proteomics / methods*
  • ROC Curve
  • Reproducibility of Results
  • Software
  • Tandem Mass Spectrometry* / methods
  • Tandem Mass Spectrometry* / standards
  • Web Browser
  • Workflow

Substances

  • Proteome