FragAnchor: a large-scale predictor of glycosylphosphatidylinositol anchors in eukaryote protein sequences by qualitative scoring

Genomics Proteomics Bioinformatics. 2007 May;5(2):121-30. doi: 10.1016/S1672-0229(07)60022-9.

Abstract

A glycosylphosphatidylinositol (GPI) anchor is a common but complex C-terminal post-translational modification of extracellular proteins in eukaryotes. Here we investigate the problem of correctly annotating GPI-anchored proteins for the growing number of sequences in public databases. We developed a computational system, called FragAnchor, based on the tandem use of a neural network (NN) and a hidden Markov model (HMM). Firstly, NN selects potential GPI-anchored proteins in a dataset, then HMM parses these potential GPI signals and refines the prediction by qualitative scoring. FragAnchor correctly predicted 91% of all the GPI-anchored proteins annotated in the Swiss-Prot database. In a large-scale analysis of 29 eukaryote proteomes, FragAnchor predicted that the percentage of highly probable GPI-anchored proteins is between 0.21% and 2.01%. The distinctive feature of FragAnchor, compared with other systems, is that it targets only the C-terminus of a protein, making it less sensitive to the background noise found in databases and possible incomplete protein sequences. Moreover, FragAnchor can be used to predict GPI-anchored proteins in all eukaryotes. Finally, by using qualitative scoring, the predictions combine both sensitivity and information content. The predictor is publicly available at [see text].

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Computational Biology / methods*
  • Databases, Protein
  • Eukaryotic Cells / chemistry*
  • Glycosylphosphatidylinositols / chemistry*
  • Glycosylphosphatidylinositols / isolation & purification
  • Glycosylphosphatidylinositols / metabolism*
  • Humans
  • Hydrophobic and Hydrophilic Interactions
  • Markov Chains
  • Models, Genetic
  • Molecular Sequence Data
  • Neural Networks, Computer
  • Predictive Value of Tests
  • Protein Processing, Post-Translational
  • Proteome / analysis
  • Sensitivity and Specificity
  • Sequence Analysis, Protein*

Substances

  • Glycosylphosphatidylinositols
  • Proteome