New feature extraction from phylogenetic profiles improved the performance of pathogen-host interactions

Front Cell Infect Microbiol. 2022 Aug 2:12:931072. doi: 10.3389/fcimb.2022.931072. eCollection 2022.

Abstract

Motivation: The understanding of pathogen-host interactions (PHIs) is essential and challenging research because this potentially provides the mechanism of molecular interactions between different organisms. The experimental exploration of PHI is time-consuming and labor-intensive, and computational approaches are playing a crucial role in discovering new unknown PHIs between different organisms. Although it has been proposed that most machine learning (ML)-based methods predict PHI, these methods are all based on the structure-based information extracted from the sequence for prediction. The selection of feature values is critical to improving the performance of predicting PHI using ML.

Results: This work proposed a new method to extract features from phylogenetic profiles as evolutionary information for predicting PHI. The performance of our approach is better than that of structure-based and ML-based PHI prediction methods. The five different extract models proposed by our approach combined with structure-based information significantly improved the performance of PHI, suggesting that combining phylogenetic profile features and structure-based methods could be applied to the exploration of PHI and discover new unknown biological relativity.

Availability and implementation: The KPP method is implemented in the Java language and is available at https://github.com/yangfangs/KPP.

Keywords: bacteria; machine learning; pathogen-host interaction; phylogenetic profile; virus.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Host-Pathogen Interactions*
  • Machine Learning*
  • Phylogeny