Machine Learning for Prediction of Drug Targets in Microbe Associated Cardiovascular Diseases by Incorporating Host-pathogen Interaction Network Parameters

Mol Inform. 2022 Mar;41(3):e2100115. doi: 10.1002/minf.202100115. Epub 2021 Oct 22.

Abstract

Host-pathogen interactions play a crucial role in invasion, infection, and induction of immune response in humans. In this work, four machine learning algorithms, namely Logistic regression, K-nearest neighbor, Support Vector Machine, and Random Forest were implemented for the classification of drug targets. The algorithms were trained using 3400 hosts and 3800 pathogen drug and non-drug target proteins as learning instances. For each protein, 68 pathogen and 73 host features were computed that included sequence, structure, biological and host-pathogen network centrality characteristics. The Random Forest classifier model achieved the best accuracy after 10-fold cross-validation. 99 % accuracy was achieved with a ROC-AUC score of 0.99±0.01 for both pathogen and host training sets. The Eigenvector Centrality of host-pathogen interactions and host-host interactions was the top feature in performing classification of pathogen and host targets respectively. Other features important for classification were the presence of catalytic and binding sites, low instability/aliphatic index, and cellular location. The Random Forest classifier was then used for prediction of drug targets involved in Microbe Associated Cardiovascular Diseases. 331 host and 743 pathogen proteins were predicted as drug targets by the random forest model and can be validated experimentally for therapeutic intervention in Microbe Associated Cardiovascular Diseases.

Keywords: Drug targets; Eigenvector Centrality; Machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cardiovascular Diseases* / drug therapy
  • Host-Pathogen Interactions
  • Humans
  • Machine Learning
  • Proteins
  • Support Vector Machine

Substances

  • Proteins