Prediction of protein interdomain linker regions by a hidden Markov model

Kyounghwa Bae; Bani K Mallick; Christine G Elsik

doi:10.1093/bioinformatics/bti363

Prediction of protein interdomain linker regions by a hidden Markov model

Bioinformatics. 2005 May 15;21(10):2264-70. doi: 10.1093/bioinformatics/bti363. Epub 2005 Mar 3.

Authors

Kyounghwa Bae¹, Bani K Mallick, Christine G Elsik

Affiliation

¹ Department of Statistics, Texas A&M University College Station, TX 77843-3143, USA.

PMID: 15746283
DOI: 10.1093/bioinformatics/bti363

Abstract

Motivation: Our aim was to predict protein interdomain linker regions using sequence alone, without requiring known homology. Identifying linker regions will delineate domain boundaries, and can be used to computationally dissect proteins into domains prior to clustering them into families. We developed a hidden Markov model of linker/non-linker sequence regions using a linker index derived from amino acid propensity. We employed an efficient Bayesian estimation of the model using Markov Chain Monte Carlo, Gibbs sampling in particular, to simulate parameters from the posteriors. Our model recognizes sequence data to be continuous rather than categorical, and generates a probabilistic output.

Results: We applied our method to a dataset of protein sequences in which domains and interdomain linkers had been delineated using the Pfam-A database. The prediction results are superior to a simpler method that also uses linker index.

Publication types

Evaluation Study

MeSH terms

Algorithms*
Amino Acid Sequence
Binding Sites
Computer Simulation
Markov Chains
Models, Chemical*
Models, Molecular*
Models, Statistical
Molecular Sequence Data
Protein Binding
Protein Structure, Tertiary
Proteins / analysis*
Proteins / chemistry*
Sequence Alignment / methods*
Sequence Analysis, Protein / methods*

Substances

Proteins