Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation

J Comput Aided Mol Des. 2020 Oct;34(10):1105-1116. doi: 10.1007/s10822-020-00323-z. Epub 2020 Jun 16.

Abstract

Phage virion protein (PVP) perforate the host cell membrane and eventually culminates in cell rupture thereby releasing replicated phages. The accurate identification of PVP is thus a crucial step towards improving our understanding of the biological function and mechanisms of PVPs. Therefore, it is desirable to develop a computational method that is capable of fast and accurate identification of PVPs. To address this, we propose a novel sequence-based meta-predictor employing probabilistic information (referred herein as the Meta-iPVP) for the accurate identification of PVPs. Particularly, efficient feature representation approach was used to generate discriminative probabilistic features from four machine learning (ML) algorithms making use of seven feature encodings. To the best of our knowledge, the Meta-iPVP is the first meta-based approach that has been developed for PVP prediction. Independent test results indicated that the Meta-iPVP could discern important characteristics between PVPs and non-PVPs as well as achieving the best accuracy and MCC of 0.817 and 0.642, respectively, which corresponds to 6-10% and 14-21% improvements over existing PVP predictors. As such, this demonstrates that the proposed Meta-iPVP is a more efficient, robust and promising for the identification of PVPs. The predictive model is deployed as a publicly accessible Meta-iPVP webserver freely available online at http://camt.pythonanywhere.com/Meta-iPVP .

Keywords: Classification; Feature selection; Machine learning; Meta-predictor; Phage virion protein; Support vector machine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Bacteriophages / metabolism*
  • Computational Biology / methods*
  • Humans
  • Machine Learning*
  • Sequence Analysis, Protein / methods*
  • Software
  • Support Vector Machine
  • Viral Proteins / chemistry*
  • Viral Proteins / metabolism
  • Virion / metabolism*

Substances

  • Viral Proteins