Improving Protein Fold Recognition by Deep Learning Networks

Taeho Jo; Jie Hou; Jesse Eickholt; Jianlin Cheng

doi:10.1038/srep17573

Improving Protein Fold Recognition by Deep Learning Networks

Sci Rep. 2015 Dec 4:5:17573. doi: 10.1038/srep17573.

Authors

Taeho Jo^{1

2}, Jie Hou¹, Jesse Eickholt³, Jianlin Cheng¹

Affiliations

¹ Department of Computer Science, University of Missouri, Columbia, MO 65211, USA.
² Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.
³ Department of Computer Science, Central Michigan University, Mount Pleasant, MI 48859, USA.

Abstract

For accurate recognition of protein folds, a deep learning network method (DN-Fold) was developed to predict if a given query-template protein pair belongs to the same structural fold. The input used stemmed from the protein sequence and structural features extracted from the protein pair. We evaluated the performance of DN-Fold along with 18 different methods on Lindahl's benchmark dataset and on a large benchmark set extracted from SCOP 1.75 consisting of about one million protein pairs, at three different levels of fold recognition (i.e., protein family, superfamily, and fold) depending on the evolutionary distance between protein sequences. The correct recognition rate of ensembled DN-Fold for Top 1 predictions is 84.5%, 61.5%, and 33.6% and for Top 5 is 91.2%, 76.5%, and 60.7% at family, superfamily, and fold levels, respectively. We also evaluated the performance of single DN-Fold (DN-FoldS), which showed the comparable results at the level of family and superfamily, compared to ensemble DN-Fold. Finally, we extended the binary classification problem of fold recognition to real-value regression task, which also show a promising performance. DN-Fold is freely available through a web server at http://iris.rnet.missouri.edu/dnfold.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Amino Acid Sequence
Artificial Intelligence
Computational Biology
Databases, Protein
Pattern Recognition, Automated
Protein Conformation*
Protein Folding*
Proteins / chemistry*
Proteins / classification
Sequence Analysis, Protein
Software*

Substances

Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding