Determining the overall merit of protein identification data sets: rho-diagrams and rho-scores

J Proteome Res. 2007 May;6(5):1997-2004. doi: 10.1021/pr070025y. Epub 2007 Mar 31.

Abstract

This paper described a simple heuristic method for determining the merit of a set of peptide sequence assignments made using tandem mass spectra. The method involved comparing a prediction based on the known stochastic behavior of a sequence assignment algorithm with the assignments generated from a particular data set. A particular formulation of this comparison was defined through the construction of a plot of the data, the rho-diagram, as well as a parameter derived from this plot, the rho-score. This plot and parameter were shown to be able to readily characterize the relative quality of a set of peptide sequence assignments and to allow the straightforward determination of probability threshold values for the interpretation of proteomics data. This plot is independent of the algorithm or scoring scheme used to estimate the statistical significance of a set of experimental results; rather, it can be used as an objective test of the correctness of those estimates. The rho-score can also be used as a parameter to evaluate the relative merit of protein identifications, such as those made across proteome species taxonomic categories.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Data Interpretation, Statistical*
  • Databases, Protein
  • Humans
  • Models, Statistical
  • Peptides* / chemistry
  • Peptides* / genetics
  • Proteins* / chemistry
  • Proteins* / genetics
  • Proteomics
  • Tandem Mass Spectrometry

Substances

  • Peptides
  • Proteins