Searching for hypothetical proteins: theory and practice based upon original data and literature

Prog Neurobiol. 2005 Sep-Oct;77(1-2):90-127. doi: 10.1016/j.pneurobio.2005.10.001. Epub 2005 Nov 4.

Abstract

A large part of mammalian proteomes is represented by hypothetical proteins (HP), i.e. proteins predicted from nucleic acid sequences only and protein sequences with unknown function. Databases are far from being complete and errors are expected. The legion of HP is awaiting experiments to show their existence at the protein level and subsequent bioinformatic handling in order to assign proteins a tentative function is mandatory. Two-dimensional gel-electrophoresis with subsequent mass spectrometrical identification of protein spots is an appropriate tool to search for HP in the high-throughput mode. Spots are identified by MS or by MS/MS measurements (MALDI-TOF, MALDI-TOF-TOF) and subsequent software as e.g. Mascot or ProFound. In many cases proteins can thus be unambiguously identified and characterised; if this is not the case, de novo sequencing or Q-TOF analysis is warranted. If the protein is not identified, the sequence is being sent to databases for BLAST searches to determine identities/similarities or homologies to known proteins. If no significant identity to known structures is observed, the protein sequence is examined for the presence of functional domains (databases PROSITE, PRINTS, InterPro, ProDom, Pfam and SMART), subjected to searches for motifs (ELM) and finally protein-protein interaction databases (InterWeaver, STRING) are consulted or predictions from conformations are performed. We here provide information about hypothetical proteins in terms of protein chemical analysis, independent of antibody availability and specificity and bioinformatic handling to contribute to the extension/completion of protein databases and include original work on HP in the brain to illustrate the processes of HP identification and functional assignment.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Databases, Protein*
  • Evidence-Based Medicine
  • Gene Expression Profiling / methods*
  • Humans
  • Mass Spectrometry / methods*
  • Molecular Sequence Data
  • Periodicals as Topic*
  • Protein Interaction Mapping / methods
  • Proteins / analysis
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid
  • Structure-Activity Relationship

Substances

  • Proteins