An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates

Proteomics. 2011 Oct;11(20):4085-95. doi: 10.1002/pmic.201000665. Epub 2011 Sep 7.

Abstract

The relevance of libraries of annotated MS/MS spectra is growing with the amount of proteomic data generated in high-throughput experiments. These reference libraries provide a fast and accurate way to identify newly acquired MS/MS spectra. In the context of multiple hypotheses testing, the control of the number of false-positive identifications expected in the final result list by means of the calculation of the false discovery rate (FDR). In a classical sequence search where experimental MS/MS spectra are compared with the theoretical peptide spectra calculated from a sequence database, the FDR is estimated by searching randomized or decoy sequence databases. Despite on-going discussion on how exactly the FDR has to be calculated, this method is widely accepted in the proteomic community. Recently, similar approaches to control the FDR of spectrum library searches were discussed. We present in this paper a detailed analysis of the similarity between spectra of distinct peptides to set the basis of our own solution for decoy library creation (DeLiberator). It differs from the previously published results in some key points, mainly in implementing new methods that prevent decoy spectra from being too similar to the original library spectra while keeping important features of real MS/MS spectra. Using different proteomic data sets and library creation methods, we evaluate our approach and compare it with alternative methods.

MeSH terms

  • Algorithms*
  • Animals
  • Databases, Protein
  • Genetic Association Studies
  • Humans
  • Peptides / chemistry*
  • Proteomics / methods*
  • Software*
  • Tandem Mass Spectrometry*

Substances

  • Peptides