Accurate prediction of protein structural class

Xia-Yu Xia; Meng Ge; Zhi-Xin Wang; Xian-Ming Pan

doi:10.1371/journal.pone.0037653

Accurate prediction of protein structural class

PLoS One. 2012;7(6):e37653. doi: 10.1371/journal.pone.0037653. Epub 2012 Jun 19.

Authors

Xia-Yu Xia¹, Meng Ge, Zhi-Xin Wang, Xian-Ming Pan

Affiliation

¹ Ministry of Education, The Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.

Abstract

Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology / methods*
Protein Conformation
Protein Structure, Tertiary
Proteins / chemistry*
Proteins / classification*

Substances

Proteins