Prediction of protein secondary structure using feature selection and analysis approach

Acta Biotheor. 2014 Mar;62(1):1-14. doi: 10.1007/s10441-013-9203-7. Epub 2013 Sep 20.

Abstract

The prediction of the secondary structure of a protein from its amino acid sequence is an important step towards the prediction of its three-dimensional structure. However, the accuracy of ab initio secondary structure prediction from sequence is about 80% currently, which is still far from satisfactory. In this study, we proposed a novel method that uses binomial distribution to optimize tetrapeptide structural words and increment of diversity with quadratic discriminant to perform prediction for protein three-state secondary structure. A benchmark dataset including 2,640 proteins with sequence identity of less than 25% was used to train and test the proposed method. The results indicate that overall accuracy of 87.8% was achieved in secondary structure prediction by using ten-fold cross-validation. Moreover, the accuracy of predicted secondary structures ranges from 84 to 89% at the level of residue. These results suggest that the feature selection technique can detect the optimized tetrapeptide structural words which affect the accuracy of predicted secondary structures.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Databases, Protein
  • Humans
  • Models, Molecular
  • Peptide Fragments / chemistry*
  • Protein Structure, Secondary*
  • Proteins / chemistry*
  • Sequence Analysis, Protein

Substances

  • Peptide Fragments
  • Proteins