EBOLApred: A machine learning-based web application for predicting cell entry inhibitors of the Ebola virus

Comput Biol Chem. 2022 Dec:101:107766. doi: 10.1016/j.compbiolchem.2022.107766. Epub 2022 Sep 2.

Abstract

Ebola virus disease (EVD) is a highly virulent and often lethal illness that affects humans through contact with the body fluid of infected persons. Glycoprotein and matrix protein VP40 play essential roles in the virus life cycle within the host. Whilst glycoprotein mediates the entry and fusion of the virus with the host cell membrane, VP40 is also responsible for viral particle assembly and budding. This study aimed at developing machine learning models to predict small molecules as possible anti-Ebola virus compounds capable of inhibiting the activities of GP and VP40 using Ebola virus (EBOV) cell entry inhibitors from the PubChem database as training data. Predictive models were developed using five algorithms comprising random forest (RF), support vector machine (SVM), naïve Bayes (NB), k-nearest neighbor (kNN), and logistic regression (LR). The models were evaluated using a 10-fold cross-validation technique and the algorithm with the best performance was the random forest model with an accuracy of 89 %, an F1 score of 0.9, and a receiver operating characteristic curve (ROC curve) showing the area under the curve (AUC) score of 0.95. LR and SVM models also showed plausible performances with overall accuracy values of 0.84 and 0.86, respectively. The models, RF, LR, and SVM were deployed as a web server known as EBOLApred accessible via http://197.255.126.13:8000/.

Keywords: Ebola virus protein; Inhibitors; Logistic regression; Machine learning; Random forest; Support vector machine.

MeSH terms

  • Bayes Theorem
  • Ebolavirus*
  • Glycoproteins
  • Humans
  • Machine Learning
  • Virus Internalization

Substances

  • Glycoproteins