Identification of Bacteriophage Virion Proteins Using Multinomial Naïve Bayes with g-Gap Feature Tree

Int J Mol Sci. 2018 Jun 15;19(6):1779. doi: 10.3390/ijms19061779.

Abstract

Bacteriophages, which are tremendously important to the ecology and evolution of bacteria, play a key role in the development of genetic engineering. Bacteriophage virion proteins are essential materials of the infectious viral particles and in charge of several of biological functions. The correct identification of bacteriophage virion proteins is of great importance for understanding both life at the molecular level and genetic evolution. However, few computational methods are available for identifying bacteriophage virion proteins. In this paper, we proposed a new method to predict bacteriophage virion proteins using a Multinomial Naïve Bayes classification model based on discrete feature generated from the g-gap feature tree. The accuracy of the proposed model reaches 98.37% with MCC of 96.27% in 10-fold cross-validation. This result suggests that the proposed method can be a useful approach in identifying bacteriophage virion proteins from sequence information. For the convenience of experimental scientists, a web server (PhagePred) that implements the proposed predictor is available, which can be freely accessed on the Internet.

Keywords: ANOVA; Multinomial Naïve Bayes; bacteriophage virion proteins; g-gap peptides.

MeSH terms

  • Bacteriophages / chemistry*
  • Bayes Theorem
  • Sequence Analysis, Protein / methods*
  • Sequence Analysis, Protein / standards
  • Software
  • Viral Structural Proteins / chemistry*

Substances

  • Viral Structural Proteins