From the genome sequence to the proteome and back: evaluation of E. coli genome annotation with a 2-D gel-based proteomics approach

Proteomics. 2007 Apr;7(7):1097-106. doi: 10.1002/pmic.200600599.

Abstract

The ambition of systems biology to understand complex biological systems at the molecular level implies that we need to have a concrete and correct understanding of each molecular entity and its function. However, even for the best-studied organism, Escherichia coli, a large number of proteins have never been identified and characterised from wild-type cells, and/or await unravelling of their biological role. Instead, the ORF models for these proteins have been predicted by suitable algorithms and/or through comparison with known, homologous proteins from other organisms, approaches which may be prone to error. In the present study, we used a combination of 2-DE, MALDI-TOF-MS and PMF to identify 1151 different proteins in E. coli K12 JM109. Comparison of the experimental with the theoretical Mr and pI values (4000 experimental values each) allowed the identification of numerous proteins with incorrect or incomplete ORF annotations in the current E. coli genome databases. Several inconsistencies in genome annotation were verified experimentally, and up to 55 candidates await further investigation. Our findings demonstrate how an up-to-date 2-D gel-based proteomics approach can be used for improving the annotation of prokaryotic genomes. They also highlight the need for harmonization among the different E. coli genome databases.

MeSH terms

  • Computational Biology
  • Electrophoresis, Gel, Two-Dimensional
  • Escherichia coli / chemistry*
  • Escherichia coli / genetics*
  • Escherichia coli Proteins / chemistry*
  • Escherichia coli Proteins / metabolism
  • Genome, Bacterial*
  • Isoelectric Point
  • Molecular Weight
  • Proteome*
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization

Substances

  • Escherichia coli Proteins
  • Proteome