Beware of moving targets: reference proteome content fluctuates substantially over the years

J Bioinform Comput Biol. 2012 Dec;10(6):1250020. doi: 10.1142/S0219720012500205. Epub 2012 Aug 7.

Abstract

Reference proteomes are generated by increasingly sophisticated annotation pipelines as part of regular genome build releases; yet, the corresponding changes in reference proteomes' content are dramatic. In the history of the NCBI-curated human proteome, the total number of entries has remained roughly constant but approximately half of the proteins from the 2003 build 33 are no longer represented by entries in current releases, while about the same number of new proteins have been added (for sequence identity thresholds 50-90%). Although mostly hypothetical proteins are affected, there are also spectacular cases of entry removal/addition of well studied proteins. The changes between the 2003 and recent human proteomes are in a similar order of magnitude as the differences between recent human and chimpanzee proteome releases. As an application example, we show that the proteome fluctuations affect the interpretation (about 74% of hits) of organelle-specific mass-spectrometry data. Although proteome quality tends to improve with more recent releases as, for example, the fraction of proteins with functional annotation has increased over time, existing evidence implies that, apparently, the proteome content still remains incomplete, not just pertaining to isoforms/sequence variants but also to proteins and their families that are clearly distinct.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Humans
  • Mass Spectrometry
  • Proteome / analysis*
  • Proteomics / methods*

Substances

  • Proteome