Protein fold recognition using HMM-HMM alignment and dynamic programming

J Theor Biol. 2016 Mar 21:393:67-74. doi: 10.1016/j.jtbi.2015.12.018. Epub 2016 Jan 19.

Abstract

Detecting three dimensional structures of protein sequences is a challenging task in biological sciences. For this purpose, protein fold recognition has been utilized as an intermediate step which helps in classifying a novel protein sequence into one of its folds. The process of protein fold recognition encompasses feature extraction of protein sequences and feature identification through suitable classifiers. Several feature extractors are developed to retrieve useful information from protein sequences. These features are generally extracted by constituting protein's sequential, physicochemical and evolutionary properties. The performance in terms of recognition accuracy has also been gradually improved over the last decade. However, it is yet to reach a well reasonable and accepted level. In this work, we first applied HMM-HMM alignment of protein sequence from HHblits to extract profile HMM (PHMM) matrix. Then we computed the distance between respective PHMM matrices using kernalized dynamic programming. We have recorded significant improvement in fold recognition over the state-of-the-art feature extractors. The improvement of recognition accuracy is in the range of 2.7-11.6% when experimented on three benchmark datasets from Structural Classification of Proteins.

Keywords: Classification; Dynamic time warping; HMM–HMM alignment profile; Protein fold recognition.

MeSH terms

  • Databases, Protein
  • Markov Chains*
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Reproducibility of Results
  • Sequence Alignment / methods*
  • Support Vector Machine

Substances

  • Proteins