A conditional neural fields model for protein threading

Jianzhu Ma; Jian Peng; Sheng Wang; Jinbo Xu

doi:10.1093/bioinformatics/bts213

A conditional neural fields model for protein threading

Bioinformatics. 2012 Jun 15;28(12):i59-66. doi: 10.1093/bioinformatics/bts213.

Authors

Jianzhu Ma¹, Jian Peng, Sheng Wang, Jinbo Xu

Affiliation

¹ Toyota Technological Institute at Chicago, IL 60637, USA.

Abstract

Motivation: Alignment errors are still the main bottleneck for current template-based protein modeling (TM) methods, including protein threading and homology modeling, especially when the sequence identity between two proteins under consideration is low (<30%).

Results: We present a novel protein threading method, CNFpred, which achieves much more accurate sequence-template alignment by employing a probabilistic graphical model called a Conditional Neural Field (CNF), which aligns one protein sequence to its remote template using a non-linear scoring function. This scoring function accounts for correlation among a variety of protein sequence and structure features, makes use of information in the neighborhood of two residues to be aligned, and is thus much more sensitive than the widely used linear or profile-based scoring function. To train this CNF threading model, we employ a novel quality-sensitive method, instead of the standard maximum-likelihood method, to maximize directly the expected quality of the training set. Experimental results show that CNFpred generates significantly better alignments than the best profile-based and threading methods on several public (but small) benchmarks as well as our own large dataset. CNFpred outperforms others regardless of the lengths or classes of proteins, and works particularly well for proteins with sparse sequence profiles due to the effective utilization of structure information. Our methodology can also be adapted to protein sequence alignment.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Amino Acid Sequence
Amino Acid Substitution
Computational Biology / methods*
Likelihood Functions
Protein Structure, Secondary
Proteins / analysis*
Sequence Alignment / methods*
Software

Substances

Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding