A Meta-proteogenomic Approach to Peptide Identification Incorporating Assembly Uncertainty and Genomic Variation

Mol Cell Proteomics. 2019 Aug 9;18(8 suppl 1):S183-S192. doi: 10.1074/mcp.TIR118.001233. Epub 2019 May 29.

Abstract

Matching metagenomic and/or metatranscriptomic data, currently often under-used, can be useful reference for metaproteomic tandem mass spectra (MS/MS) data analysis. Here we developed a software pipeline for identification of peptides and proteins from metaproteomic MS/MS data using proteins derived from matching metagenomic (and metatranscriptomic) data as the search database, based on two novel approaches Graph2Pro (published) and Var2Pep (new). Graph2Pro retains and uses uncertainties of metagenome assembly for reference-based MS/MS data analysis. Var2Pep considers the variations found in metagenomic/metatranscriptomic sequencing reads that are not retained in the assemblies (contigs). The new software pipeline provides one stop application of both tools, and it supports the use of metagenome assembly from commonly used assemblers including MegaHit and metaSPAdes. When tested on two collections of multi-omic microbiome data sets, our pipeline significantly improved the identification rate of the metaproteomic MS/MS spectra by about two folds, comparing to conventional contig- or read-based approaches (the Var2Pep alone identified 5.6% to 24.1% more unique peptides, depending on the data set). We also showed that identified variant peptides are important for functional profiling of microbiomes. All results suggested that it is important to take into consideration of the assembly uncertainties and genomic variants to facilitate metaproteomic MS/MS data interpretation.

Keywords: Bioinformatics; Bioinformatics software; Data evaluation; Database design; Microbiome; assembly graph; genomic variation; metaproteomics.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Databases, Protein
  • Genetic Variation
  • Microbiota / genetics*
  • Peptides / genetics
  • Proteogenomics / methods*
  • Seawater / microbiology*
  • Tandem Mass Spectrometry
  • Wastewater / microbiology*

Substances

  • Peptides
  • Waste Water