Filtering strategies for improving protein identification in high-throughput MS/MS studies

Jussi Salmi; Tuula A Nyman; Olli S Nevalainen; Tero Aittokallio

doi:10.1002/pmic.200800517

Filtering strategies for improving protein identification in high-throughput MS/MS studies

Proteomics. 2009 Feb;9(4):848-60. doi: 10.1002/pmic.200800517.

Authors

Jussi Salmi¹, Tuula A Nyman, Olli S Nevalainen, Tero Aittokallio

Affiliation

¹ Department of Information Technology, University of Turku, Turku, Finland. jussi.salmi@utu.fi

PMID: 19160393
DOI: 10.1002/pmic.200800517

Abstract

Despite the recent advances in streamlining high-throughput proteomic pipelines using tandem mass spectrometry (MS/MS), reliable identification of peptides and proteins on a larger scale has remained a challenging task, still involving a considerable degree of user interaction. Recently, a number of papers have proposed computational strategies both for distinguishing poor MS/MS spectra prior to database search (pre-filtering) as well as for verifying the peptide identifications made by the search programs (post-filtering). Both of these filtering approaches can be very beneficial to the overall protein identification pipeline, since they can remove a substantial part of the time consuming manual validation work and convert large sets of MS/MS spectra into more reliable and interpretable proteome information. The choice of the filtering method depends both on the properties of the data and on the goals of the experiment. This review discusses the different pre- and post-filtering strategies available to the researchers, together with their relative merits and potential pitfalls. We also highlight some additional research topics, such as spectral denoising and statistical assessment of the identification results, which aim at further improving the coverage and accuracy of high-throughput protein identification studies.

Publication types

Research Support, Non-U.S. Gov't
Review

MeSH terms

Algorithms
Artificial Intelligence*
Cluster Analysis
Computer Simulation
Data Interpretation, Statistical
Fourier Analysis
Models, Statistical
Peptides / chemistry
Proteins / chemistry*
Proteomics / methods*
Tandem Mass Spectrometry / methods*

Substances

Peptides
Proteins