MinProtMaxVP: Generating a minimized number of protein variant sequences containing all possible variant peptides for proteogenomic analysis

J Proteomics. 2020 Jul 15:223:103819. doi: 10.1016/j.jprot.2020.103819. Epub 2020 May 12.

Abstract

Identifying single-amino-acid variants (SAVs) from mass spectrometry-based experiments is critical for validating single-nucleotide variants (SNVs) at the protein level to facilitate biomedical research. Currently, two approaches are usually applied to convert SNV annotations into SAV-harboring protein sequences. One approach generates one sequence containing exactly one SAV, and the other all SAVs. However, they may neglect the possibility of SAV combinations, e.g., haplotypes, existing in bio-samples. Therefore, it is necessary to consider all SAV combinations of a protein when generating SAV-harboring protein sequences. In this paper, we propose MinProtMaxVP, a novel approach which selects a minimized number of SAV-harboring protein sequences generated from the exhaustive approach, while still accommodating all possible variant peptides, by solving a classic set covering problem. Our study on known haplotype variations of TAS2R38 justifies the necessity for MinProtMaxVP to consider all combinations of SAVs. The performance of MinProtMaxVP is demonstrated by an in silico study on OR2T27 with five SAVs and real experimental data of the HEK293 cell line. Furthermore, assuming simulated somatic and germline variants of OR2T27 in tumor and normal tissues demonstrates that when adopting the appropriate somatic and germline SAV integration strategy, MinProtMaxVP is adaptable to labeling and label-free mass spectrometry-based experiments. SIGNIFICANCE: We present MinProtMaxVP, a novel approach to generate SAV-harboring protein sequences for constructing a customized protein sequence database, which is used in database searching for variant peptide identification. This approach outperforms the existing approaches in generating all possible variant peptides to be included in protein sequences and possibly leading to identification of more variant peptides in proteogenomic analysis.

Keywords: Proteogenomic analysis; SAV-harboring protein sequences; Single amino acid variants (SAVs); Single nucleotide variants (SNVs); Variant peptides.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Databases, Protein
  • HEK293 Cells
  • Humans
  • Peptides / genetics
  • Proteogenomics*

Substances

  • Peptides