Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods

Molecules. 2018 Aug 10;23(8):2000. doi: 10.3390/molecules23082000.

Abstract

Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for identifying phage virion proteins, it is extremely urgent to apply machine learning methods to accurately and efficiently identify phage virion proteins. In this work, a support vector machine (SVM) based method was proposed by mixing multiple sets of optimal g-gap dipeptide compositions. The analysis of variance (ANOVA) and the minimal-redundancy-maximal-relevance (mRMR) with an increment feature selection (IFS) were applied to single out the optimal feature set. In the five-fold cross-validation test, the proposed method achieved an overall accuracy of 87.95%. We believe that the proposed method will become an efficient and powerful method for scientists concerning phage virion proteins.

Keywords: ANOVA; feature fusion; mRMR; machine learning; phage virion protein.

MeSH terms

  • Algorithms
  • Analysis of Variance
  • Bacteriophages*
  • Computational Biology / methods*
  • Databases, Protein
  • ROC Curve
  • Reproducibility of Results
  • Support Vector Machine*
  • Viral Proteins / chemistry*
  • Virion*

Substances

  • Viral Proteins