Proteogenomics for environmental microbiology

Proteomics. 2013 Oct;13(18-19):2731-42. doi: 10.1002/pmic.201200576. Epub 2013 Jun 18.

Abstract

Proteogenomics sensu stricto refers to the use of proteomic data to refine the annotation of genomes from model organisms. Because of the limitations of automatic annotation pipelines, a relatively high number of errors occur during the structural annotation of genes coding for proteins. Whether putative orphan sequences or short genes encoding low-molecular-weight proteins really exist is still frequently a mystery. Whether start codons are well defined is also an open debate. These problems are exacerbated for genomes of microorganisms belonging to poorly documented genera, as related sequences are not always available for homology-guided annotation. The functional annotation of a significant proportion of genes is also another well-known issue when annotating environmental microorganisms. High-throughput shotgun proteomics has recently greatly evolved, allowing the exploration of the proteome from any microorganism at an unprecedented depth. The structural and functional annotation process may be usefully complemented with experimental data. Indeed, proteogenomic mapping has been successfully performed for a wide variety of organisms. Specific approaches devoted to systematically establishing the N-termini of a large set of proteins are being developed. N-terminomics is giving rise to datasets of experimentally proven translational start codons as well as validated peptide signals for secreted proteins. By extension, combining genomic and proteomic data is becoming routine in many research projects. The proteomic analysis of organisms with unfinished genome sequences, the so-called composite proteomics, and the search for microbial biomarkers by bottom-up and top-down combined approaches are some examples of proteogenomic-flavored studies. They illustrate the advent of a new era of environmental microbiology where proteomics and genomics are intimately integrated to answer key biological questions.

Keywords: Genome annotation; High-throughput proteomics; Microbiology; N-Terminomics; Proteogenomics; Translational start site.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Bacteria / genetics
  • Bacteria / metabolism
  • Environmental Microbiology*
  • Genome, Bacterial
  • Molecular Sequence Annotation
  • Proteomics / methods*
  • Sequence Analysis, DNA