Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores

IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):2008-2016. doi: 10.1109/TCBB.2020.2966450. Epub 2021 Oct 7.

Abstract

Protein fold recognition is one of the most essential steps for protein structure prediction, aiming to classify proteins into known protein folds. There are two main computational approaches: one is the template-based method based on the alignment scores between query-template protein pairs and the other is the machine learning method based on the feature representation and classifier. These two approaches have their own advantages and disadvantages. Can we combine these methods to establish more accurate predictors for protein fold recognition? In this study, we made an initial attempt and proposed two novel algorithms: TSVM-fold and ESVM-fold. TSVM-fold was based on the Support Vector Machines (SVMs), which utilizes a set of pairwise sequence similarity scores generated by three complementary template-based methods, including HHblits, SPARKS-X, and DeepFR. These scores measured the global relationships between query sequences and templates. The comprehensive features of the attributes of the sequences were fed into the SVMs for the prediction. Then the TSVM-fold was further combined with the HHblits algorithm so as to improve its generalization ability. The combined method is called ESVM-fold. Experimental results in two rigorous benchmark datasets (LE and YK datasets) showed that the proposed methods outperform some state-of-the-art methods, indicating that the TSVM-fold and ESVM-fold are efficient predictors for protein fold recognition.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Pattern Recognition, Automated
  • Protein Folding*
  • Proteins* / chemistry
  • Proteins* / genetics
  • Proteins* / metabolism
  • Sequence Analysis, Protein / methods*
  • Support Vector Machine*

Substances

  • Proteins