Bayesian Proteoform Modeling Improves Protein Quantification of Global Proteomic Measurements

Mol Cell Proteomics. 2014 Aug 16:mcp.O113.030932. doi: 10.1074/mcp.O113.030932. Online ahead of print.

Abstract

As the capability of mass spectrometry-based proteomics has matured, tens of thousands of peptides can be measured simultaneously, which has the benefit of offering a systems view of protein expression. However, a major challenge is that with an increase in throughput, protein quantification estimation from the native measured peptides has become a computational task. A limitation to existing computationally-driven protein quantification methods is that most ignore protein variation, such as alternate splicing of the RNA transcript and post-translational modifications or other possible proteoforms, which will affect a significant fraction of the proteome. The consequence of this assumption is that statistical inference at the protein level, and consequently downstream analyses, such as network and pathway modeling, have only limited power for biomarker discovery. Here, we describe a Bayesian model (BP-Quant) that uses statistically derived peptides signatures to identify peptides that are outside the dominant pattern, or the existence of multiple over-expressed patterns to improve relative protein abundance estimates. It is a research-driven approach that utilizes the objectives of the experiment, defined in the context of a standard statistical hypothesis, to identify a set of peptides exhibiting similar statistical behavior relating to a protein. This approach infers that changes in relative protein abundance can be used as a surrogate for changes in function, without necessarily taking into account the effect of differential post-translational modifications, processing, or splicing in altering protein function. We verify the approach using a dilution study from mouse plasma samples and demonstrate that BP-Quant achieves similar accuracy as the current state-of-the-art methods at proteoform identification with significantly better specificity. BP-Quant is available as a MatLab ® and R packages at https://github.com/PNNL-Comp-Mass-Spec/BP-Quant.

Keywords: Bioinformatics; Biomarkers; Computational Biology; Quantification; Statistics; Tandem Mass Spectrometry.