Identification of adaptor proteins using the ANOVA feature selection technique

Methods. 2022 Dec:208:42-47. doi: 10.1016/j.ymeth.2022.10.008. Epub 2022 Oct 29.

Abstract

The adaptor proteins play a crucially important role in regulating lymphocyte activation. Rapid and efficient identification of adaptor proteins is essential for understanding their functions. However, biochemical methods require not only expensive experimental costs, but also long experiment cycles and more personnel. Therefore, a computational method that could accurately identify adaptor proteins is urgently needed. To solve this issue, we developed a classifier that combined the support vector machine (SVM) with the composition of k-Spaced Amino Acid Pairs (CKSAAP) and the amino acid composition (AAC) to identify adaptor proteins. Analysis of variance (ANOVA) was used to select the optimized features which could generate the maximum prediction performance. By examining the proposed model on independent data, we found that the 447 optimized features could achieve an accuracy of 92.39% with an AUC of 0.9766, demonstrating the powerful capabilities of our model. We hope that the proposed model could provide more clues for studying adaptor proteins.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / metabolism
  • Analysis of Variance
  • Computational Biology* / methods
  • Support Vector Machine*

Substances

  • Amino Acids