Review and comparative analysis of machine learning-based phage virion protein identification methods

Biochim Biophys Acta Proteins Proteom. 2020 Jun;1868(6):140406. doi: 10.1016/j.bbapap.2020.140406. Epub 2020 Mar 2.

Abstract

Phage virion protein (PVP) identification plays key role in elucidating relationships between phages and hosts. Moreover, PVP identification can facilitate the design of related biochemical entities. Recently, several machine learning approaches have emerged for this purpose and have shown their potential capacities. In this study, the proposed PVP identifiers are systemically reviewed, and the related algorithms and tools are comprehensively analyzed. We summarized the common framework of these PVP identifiers and constructed our own novel identifiers based upon the framework. Furthermore, we focus on a performance comparison of all PVP identifiers by using a training dataset and an independent dataset. Highlighting the pros and cons of these identifiers demonstrates that g-gap DPC (dipeptide composition) features are capable of representing characteristics of PVPs. Moreover, SVM (support vector machine) is proven to be the more effective classifier to distinguish PVPs and non-PVPs.

Keywords: G-gap DPC; Machine leaning; Phage virion proteins; Support vector machine.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Algorithms
  • Bacteriophages / metabolism*
  • Computational Biology / methods*
  • Machine Learning*
  • Support Vector Machine
  • Viral Proteins / isolation & purification*
  • Virion / metabolism*

Substances

  • Viral Proteins