Validating annotations for uncharacterized proteins in Shewanella oneidensis

OMICS. 2008 Sep;12(3):211-5. doi: 10.1089/omi.2008.0051.

Abstract

Proteins of unknown function are a barrier to our understanding of molecular biology. Assigning function to these "uncharacterized" proteins is imperative, but challenging. The usual approach is similarity searches using annotation databases, which are useful for predicting function. However, since the performance of these databases on uncharacterized proteins is basically unknown, the accuracy of their predictions is suspect, making annotation difficult. To address this challenge, we developed a benchmark annotation dataset of 30 proteins in Shewanella oneidensis. The proteins in the dataset were originally uncharacterized after the initial annotation of the S. oneidensis proteome in 2002. In the intervening 5 years, the accumulation of new experimental evidence has enabled specific functions to be predicted. We utilized this benchmark dataset to evaluate several commonly utilized annotation databases. According to our criteria, six annotation databases accurately predicted functions for at least 60% of proteins in our dataset. Two of these six even had a "conditional accuracy" of 90%. Conditional accuracy is another evaluation metric we developed which excludes results from databases where no function was predicted. Also, 27 of the 30 proteins' functions were correctly predicted by at least one database. These represent one of the first performance evaluations of annotation databases on uncharacterized proteins. Our evaluation indicates that these databases readily incorporate new information and are accurate in predicting functions for uncharacterized proteins, provided that experimental function evidence exists.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bacterial Proteins / classification*
  • Bacterial Proteins / genetics
  • Bacterial Proteins / metabolism
  • Databases, Protein*
  • Molecular Sequence Data
  • Reproducibility of Results
  • Sequence Analysis, Protein*
  • Shewanella / chemistry*
  • Shewanella / genetics

Substances

  • Bacterial Proteins