In silico characterization of proteins: UniProt, InterPro and Integr8

Nicola Jane Mulder; Paul Kersey; Manuela Pruess; Rolf Apweiler

doi:10.1007/s12033-007-9003-x

In silico characterization of proteins: UniProt, InterPro and Integr8

Mol Biotechnol. 2008 Feb;38(2):165-77. doi: 10.1007/s12033-007-9003-x. Epub 2007 Oct 4.

Authors

Nicola Jane Mulder¹, Paul Kersey, Manuela Pruess, Rolf Apweiler

Affiliation

¹ EMBL Outstation - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. mulder@ebi.ac.uk

PMID: 18219596
DOI: 10.1007/s12033-007-9003-x

Abstract

Nucleic acid sequences from genome sequencing projects are submitted as raw data, from which biologists attempt to elucidate the function of the predicted gene products. The protein sequences are stored in public databases, such as the UniProt Knowledgebase (UniProtKB), where curators try to add predicted and experimental functional information. Protein function prediction can be done using sequence similarity searches, but an alternative approach is to use protein signatures, which classify proteins into families and domains. The major protein signature databases are available through the integrated InterPro database, which provides a classification of UniProtKB sequences. As well as characterization of proteins through protein families, many researchers are interested in analyzing the complete set of proteins from a genome (i.e. the proteome), and there are databases and resources that provide non-redundant proteome sets and analyses of proteins from organisms with completely sequenced genomes. This article reviews the tools and resources available on the web for single and large-scale protein characterization and whole proteome analysis.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Review

MeSH terms

Amino Acid Sequence
Computational Biology / methods*
Databases, Protein*
Genome / genetics
Humans
Proteins / chemistry
Proteins / classification
Proteins / genetics
Proteins / metabolism*

Substances

Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding