SUS-BAR: a database of pig proteins with statistically validated structural and functional annotation

Database (Oxford). 2013 Sep 23:2013:bat065. doi: 10.1093/database/bat065. Print 2013.

Abstract

Given the relevance of the pig proteome in different studies, including human complex maladies, a statistical validation of the annotation is required for a better understanding of the role of specific genes and proteins in the complex networks underlying biological processes in the animal. Presently, approximately 80% of the pig proteome is still poorly annotated, and the existence of protein sequences is routinely inferred automatically by sequence alignment towards preexisting sequences. In this article, we introduce SUS-BAR, a database that derives information mainly from UniProt Knowledgebase and that includes 26 206 pig protein sequences. In SUS-BAR, 16 675 of the pig protein sequences are endowed with statistically validated functional and structural annotation. Our statistical validation is determined by adopting a cluster-centric annotation procedure that allows transfer of different types of annotation, including structure and function. Each sequence in the database can be associated with a set of statistically validated Gene Ontologies (GOs) of the three main sub-ontologies (Molecular Function, Biological Process and Cellular Component), with Pfam functional domains, and when possible, with a cluster Hidden Markov Model that allows modelling the 3D structure of the protein. A database search allows some statistics demonstrating the enrichment in both GO and Pfam annotations of the pig proteins as compared with UniProt Knowledgebase annotation. Searching in SUS-BAR allows retrieval of the pig protein annotation for further analysis. The search is also possible on the basis of specific GO terms and this allows retrieval of all the pig sequences participating into a given biological process, after annotation with our system. Alternatively, the search is possible on the basis of structural information, allowing retrieval of all the pig sequences with the same structural characteristics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Databases, Protein*
  • Gene Ontology
  • Humans
  • Molecular Sequence Annotation*
  • Proteome / genetics
  • Reproducibility of Results
  • Statistics as Topic*
  • Sus scrofa / metabolism*
  • User-Computer Interface

Substances

  • Proteome