An Ensemble Method to Distinguish Bacteriophage Virion from Non-Virion Proteins Based on Protein Sequence Characteristics

Int J Mol Sci. 2015 Sep 9;16(9):21734-58. doi: 10.3390/ijms160921734.

Abstract

Bacteriophage virion proteins and non-virion proteins have distinct functions in biological processes, such as specificity determination for host bacteria, bacteriophage replication and transcription. Accurate identification of bacteriophage virion proteins from bacteriophage protein sequences is significant to understand the complex virulence mechanism in host bacteria and the influence of bacteriophages on the development of antibacterial drugs. In this study, an ensemble method for bacteriophage virion protein prediction from bacteriophage protein sequences is put forward with hybrid feature spaces incorporating CTD (composition, transition and distribution), bi-profile Bayes, PseAAC (pseudo-amino acid composition) and PSSM (position-specific scoring matrix). When performing on the training dataset 10-fold cross-validation, the presented method achieves a satisfactory prediction result with a sensitivity of 0.870, a specificity of 0.830, an accuracy of 0.850 and Matthew's correlation coefficient (MCC) of 0.701, respectively. To evaluate the prediction performance objectively, an independent testing dataset is used to evaluate the proposed method. Encouragingly, our proposed method performs better than previous studies with a sensitivity of 0.853, a specificity of 0.815, an accuracy of 0.831 and MCC of 0.662 on the independent testing dataset. These results suggest that the proposed method can be a potential candidate for bacteriophage virion protein prediction, which may provide a useful tool to find novel antibacterial drugs and to understand the relationship between bacteriophage and host bacteria. For the convenience of the vast majority of experimental Int. J. Mol. Sci. 2015, 16,21735 scientists, a user-friendly and publicly-accessible web-server for the proposed ensemble method is established.

Keywords: bacteriophage virion proteins; ensemble method; hybrid features; relief.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Bacteriophages / metabolism*
  • Computational Biology / methods*
  • Internet
  • ROC Curve
  • Reproducibility of Results
  • Viral Proteins / chemistry*
  • Viral Proteins / metabolism*
  • Virion / metabolism*
  • Web Browser

Substances

  • Viral Proteins