Charger: combination of signal processing and statistical learning algorithms for precursor charge-state determination from electron-transfer dissociation spectra

Anal Chem. 2008 Jan 15;80(2):376-86. doi: 10.1021/ac071332q. Epub 2007 Dec 15.

Abstract

Tandem mass spectrometry in combination with liquid chromatography has emerged as a powerful tool for characterization of complex protein mixtures in a high-throughput manner. One of the bioinformatics challenges posed by the mass spectral data analysis is the determination of precursor charge when unit mass resolution is used for detecting fragment ions. The charge-state information is used to filter database sequences before they are correlated to experimental data. In the absence of the accurate charge state, several charge states are assumed. This dramatically increases database search times. To address this problem, we have developed an approach for charge-state determination of peptides from their tandem mass spectra obtained in fragmentations via electron-transfer dissociation (ETD) reactions. Protein analysis by ETD is thought to enhance the range of amino acid sequences that can be analyzed by mass spectrometry-based proteomics. One example is the improved capability to characterize phosphorylated peptides. Our approach to charge-state determination uses a combination of signal processing and statistical machine learning. The signal processing employs correlation and convolution analyses to determine precursor masses and charge states of peptides. We discuss applicability of these methods to spectra of different charge states. We note that in our applications correlation analysis outperforms the convolution in determining peptide charge states. The correlation analysis is best suited for spectra with prevalence of complementary ions. It is highly specific but is dependent on quality of spectra. The linear discriminant analysis (LDA) approach uses a number of other spectral features to predict charge states. We train LDA classifier on a set of manually curated spectral data from a mixture of proteins of known identity. There are over 5000 spectra in the training set. A number of features, pertinent to spectra of peptides obtained via ETD reactions, have been used in the training. The loading coefficients of LDA indicate the relative importance of different features for charge-state determination. We have applied our model to a test data set generated from a mixture of 49 proteins. We search the spectra with and without use of the charge-state determination. The charge-state determination helps to significantly save the database search times. We discuss the cost associated with the possible misclassification of charge states.

MeSH terms

  • Algorithms
  • Animals
  • Artificial Intelligence
  • Chromatography, Liquid
  • Cytochromes c / chemistry
  • Data Interpretation, Statistical
  • Databases, Protein
  • Electrons
  • Peptides / chemistry
  • ROC Curve
  • Signal Processing, Computer-Assisted
  • Tandem Mass Spectrometry / instrumentation*
  • Tandem Mass Spectrometry / statistics & numerical data

Substances

  • Peptides
  • Cytochromes c