[A new peptide retention time prediction method for mass spectrometry based proteomic analysis by a serial and parallel support vector machine model]

Se Pu. 2012 Sep;30(9):857-63. doi: 10.3724/sp.j.1123.2012.06021.
[Article in Chinese]

Abstract

The online reversed-phase liquid chromatography (RPLC) contributes a lot for the large scale mass spectrometry based protein identification in proteomics. Retention time (RT) as an important evidence can be used to distinguish the false positive/true positive peptide identifications. Because of the nonlinear concentration curve of organic phase in the whole range of run time and the interactions among peptides, the sequence based RT prediction of peptides has low accuracy and is difficult to generalize in practice, and thus is less effective in the validation of peptide identifications. A serial and parallel support vector machine (SP-SVM) method was proposed to characterize the nonlinear effect of organic phase concentration and the interactions among peptides. The SP-SVM contains a support vector regression (SVR) only for model training (named as p-SVR) and 4 SVM models (named as C-SVM, 1-SVR, s-SVR and n-SVR) for the RT prediction. After distinguishing the peptide chromatographic behavior by C-SVM, 1-SVR and s-SVR were used to predict the peptide RT specifically to improve the accuracy. Then the peptide RT was normalized by n-SVR to characterize the peptide interactions. The prediction accuracy was improved significantly by applying this method to the processing of the complex sample dataset. The coefficient of the determination between predictive and experimental RTs reaches 0. 95, the prediction error range was less than 20% of the total LC run time for more than 95% cases, and less than 10% of the total LC run time for more than 70% cases. The performance of this model reaches the best of known so far. More important, the SP-SVM method provides a framework to take into account the interactions among peptides in chromatographic separation, and its performance can be improved further by introducing new data processing and experiment strategy.

Publication types

  • English Abstract
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Forecasting
  • Mass Spectrometry / methods*
  • Peptides / chemistry*
  • Proteome / analysis*
  • Proteomics / methods*
  • Support Vector Machine*
  • Time Factors

Substances

  • Peptides
  • Proteome