RT-SVR+q: a strategy for post-Mascot analysis using retention time and q value metric to improve peptide and protein identifications

J Proteomics. 2011 Dec 21;75(2):480-90. doi: 10.1016/j.jprot.2011.08.013. Epub 2011 Aug 24.

Abstract

Shotgun proteomics commonly utilizes database search like Mascot to identify proteins from tandem MS/MS spectra. False discovery rate (FDR) is often used to assess the confidence of peptide identifications. However, a widely accepted FDR of 1% sacrifices the sensitivity of peptide identification while improving the accuracy. This article details a machine learning approach combining retention time based support vector regressor (RT-SVR) with q value based statistical analysis to improve peptide and protein identifications with high sensitivity and accuracy. The use of confident peptide identifications as training examples and careful feature selection ensures high R values (>0.900) for all models. The application of RT-SVR model on Mascot results (p=0.10) increases the sensitivity of peptide identifications. q Value, as a function of deviation between predicted and experimental RTs (ΔRT), is used to assess the significance of peptide identifications. We demonstrate that the peptide and protein identifications increase by up to 89.4% and 83.5%, respectively, for a specified q value of 0.01 when applying the method to proteomic analysis of the natural killer leukemia cell line (NKL). This study establishes an effective methodology and provides a platform for profiling confident proteomes in more relevant species as well as a future investigation of accurate protein quantification.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Artificial Intelligence
  • Databases, Protein*
  • Peptides / analysis
  • Proteins / analysis*
  • Proteomics / methods*
  • Support Vector Machine*
  • Tandem Mass Spectrometry / methods*

Substances

  • Peptides
  • Proteins