Prediction of protein interdomain linker regions by a hidden Markov model

Bioinformatics. 2005 May 15;21(10):2264-70. doi: 10.1093/bioinformatics/bti363. Epub 2005 Mar 3.

Abstract

Motivation: Our aim was to predict protein interdomain linker regions using sequence alone, without requiring known homology. Identifying linker regions will delineate domain boundaries, and can be used to computationally dissect proteins into domains prior to clustering them into families. We developed a hidden Markov model of linker/non-linker sequence regions using a linker index derived from amino acid propensity. We employed an efficient Bayesian estimation of the model using Markov Chain Monte Carlo, Gibbs sampling in particular, to simulate parameters from the posteriors. Our model recognizes sequence data to be continuous rather than categorical, and generates a probabilistic output.

Results: We applied our method to a dataset of protein sequences in which domains and interdomain linkers had been delineated using the Pfam-A database. The prediction results are superior to a simpler method that also uses linker index.

Publication types

  • Evaluation Study

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Binding Sites
  • Computer Simulation
  • Markov Chains
  • Models, Chemical*
  • Models, Molecular*
  • Models, Statistical
  • Molecular Sequence Data
  • Protein Binding
  • Protein Structure, Tertiary
  • Proteins / analysis*
  • Proteins / chemistry*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins