Pool-seq driven proteogenomic database for Group G Streptococcus

J Proteomics. 2019 Jun 15:201:84-92. doi: 10.1016/j.jprot.2019.04.015. Epub 2019 Apr 20.

Abstract

Proteogenomic databases use genomic and transcriptomic information for improved identification of peptides and proteins from mass spectrometry analyses. One application of such databases is in the discovery of variants/mutations. In this study, we created a proteogenomic database that contained sequences with variants derived from Pooled sequencing experiments (137 Group G Streptococcus strains sequenced in 3 pools) and used tandem mass spectrometry (MS/MS) to analyse eight protein samples from randomly selected strains sequenced in the pools. Using the proteogenomic variant database, we identified 385 variant peptides from the eight samples, none of which could be identified from the single genome conventional database utilized, while 71.2% and 93.5% of them were identified from the databases that contained 4 complete genomes and 26 assemblies, respectively. The proteogenomic variant databases exhibited the same properties as the conventional databases in terms of the Andromeda score distributions and the posterior error probability (PEP) values of the identified peptides. SIGNIFICANCE: For bacterial populations, such as Group G Streptococcus (GGS), with substantial intra-species diversity, simultaneous sequencing of large numbers of strains and generation of proteogenomic databases from those aids in improving the discovery of peptides in mass spectrometric analyses. Therefore, generation of proteogenomic variant protein databases from Pooled sequencing experiments can be a cost-effective method to complement conventional databases and discover subtle strain wise differences.

Keywords: Group G Streptococcus; Pool-seq; Proteogenomics; Variant protein database.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins* / genetics
  • Bacterial Proteins* / metabolism
  • Databases, Protein*
  • Genome, Bacterial*
  • Proteogenomics*
  • Streptococcus* / genetics
  • Streptococcus* / metabolism

Substances

  • Bacterial Proteins