PGCA: An algorithm to link protein groups created from MS/MS data

David Kepplinger; Mandeep Takhar; Mayu Sasaki; Zsuzsanna Hollander; Derek Smith; Bruce McManus; W Robert McMaster; Raymond T Ng; Gabriela V Cohen Freue

doi:10.1371/journal.pone.0177569

PGCA: An algorithm to link protein groups created from MS/MS data

PLoS One. 2017 May 31;12(5):e0177569. doi: 10.1371/journal.pone.0177569. eCollection 2017.

Authors

David Kepplinger¹, Mandeep Takhar², Mayu Sasaki², Zsuzsanna Hollander², Derek Smith³, Bruce McManus², W Robert McMaster⁴, Raymond T Ng^{2

5}, Gabriela V Cohen Freue¹

Affiliations

¹ Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada.
² NCE CECR PROOF Centre of Excellence, Vancouver, British Columbia, Canada.
³ University of Victoria - Genome BC Proteomics Centre, Victoria, British Columbia, Canada.
⁴ Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada.
⁵ Department of Computer Science, University of British Columbia, Vancouver, British Columbia, Canada.

Abstract

The quantitation of proteins using shotgun proteomics has gained popularity in the last decades, simplifying sample handling procedures, removing extensive protein separation steps and achieving a relatively high throughput readout. The process starts with the digestion of the protein mixture into peptides, which are then separated by liquid chromatography and sequenced by tandem mass spectrometry (MS/MS). At the end of the workflow, recovering the identity of the proteins originally present in the sample is often a difficult and ambiguous process, because more than one protein identifier may match a set of peptides identified from the MS/MS spectra. To address this identification problem, many MS/MS data processing software tools combine all plausible protein identifiers matching a common set of peptides into a protein group. However, this solution introduces new challenges in studies with multiple experimental runs, which can be characterized by three main factors: i) protein groups' identifiers are local, i.e., they vary run to run, ii) the composition of each group may change across runs, and iii) the supporting evidence of proteins within each group may also change across runs. Since in general there is no conclusive evidence about the absence of proteins in the groups, protein groups need to be linked across different runs in subsequent statistical analyses. We propose an algorithm, called Protein Group Code Algorithm (PGCA), to link groups from multiple experimental runs by forming global protein groups from connected local groups. The algorithm is computationally inexpensive and enables the connection and analysis of lists of protein groups across runs needed in biomarkers studies. We illustrate the identification problem and the stability of the PGCA mapping using 65 iTRAQ experimental runs. Further, we use two biomarker studies to show how PGCA enables the discovery of relevant candidate protein group markers with similar but non-identical compositions in different runs.

MeSH terms

Algorithms*
Amino Acid Sequence
Biomarkers
Heart Transplantation
Humans
Muscular Dystrophies / metabolism
Proteins / chemistry*
Proteomics
Sequence Homology, Amino Acid
Tandem Mass Spectrometry / methods*

Substances

Biomarkers
Proteins

Grants and funding

The author(s) received no specific funding for this work.