ML-DTD: Machine Learning-Based Drug Target Discovery for the Potential Treatment of COVID-19

Vaccines (Basel). 2022 Sep 30;10(10):1643. doi: 10.3390/vaccines10101643.

Abstract

Recent research has highlighted that a large section of druggable protein targets in the Human interactome remains unexplored for various diseases. It might lead to the drug repurposing study and help in the in-silico prediction of new drug-human protein target interactions. The same applies to the current pandemic of COVID-19 disease in global health issues. It is highly desirable to identify potential human drug targets for COVID-19 using a machine learning approach since it saves time and labor compared to traditional experimental methods. Structure-based drug discovery where druggability is determined by molecular docking is only appropriate for the protein whose three-dimensional structures are available. With machine learning algorithms, differentiating relevant features for predicting targets and non-targets can be used for the proteins whose 3-D structures are unavailable. In this research, a Machine Learning-based Drug Target Discovery (ML-DTD) approach is proposed where a machine learning model is initially built up and tested on the curated dataset consisting of COVID-19 human drug targets and non-targets formed by using the Therapeutic Target Database (TTD) and human interactome using several classifiers like XGBBoost Classifier, AdaBoost Classifier, Logistic Regression, Support Vector Classification, Decision Tree Classifier, Random Forest Classifier, Naive Bayes Classifier, and K-Nearest Neighbour Classifier (KNN). In this method, protein features include Gene Set Enrichment Analysis (GSEA) ranking, properties derived from the protein sequence, and encoded protein network centrality-based measures. Among all these, XGBBoost, KNN, and Random Forest models are satisfactory and consistent. This model is further used to predict novel COVID-19 human drug targets, which are further validated by target pathway analysis, the emergence of allied repurposed drugs, and their subsequent docking study.

Keywords: COVID-19 human drug non-targets; COVID-19 human drug targets; docking; gene set enrichment analysis; machine learning; network centrality; protein sequence.

Grants and funding

UGC Research Award (F.30-31/2016(SA-II)) from UGC, Government of India, and DBT project (No.BT/PR16356/BID/7/596/2016), Ministry of Science and Technology, Government of India; Excellence Initiative: Research University (IDUB) IDUB grant BOB-IDUB-622-197/2021; Polish National Science Center (2019/35/O/ST6/02484 and 2020/37/B/NZ2/03757), Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund; European Commission Horizon 2020 Marie Skłodowska-Curie ITN Enpathy grant ‘Molecular Basis of Human enhanceropathies’; and National Institute of Health USA 4DNucleome grant 1U54DK107967-01 “Nucleome Positioning System for Spatiotemporal Genome Organization and Regulation”; Marie Skłodowska-Curie Action: co-supported as RENOIR Project by the European Union Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 691152 and by Ministry of Science and Higher Education (Poland), grant Nos. W34/H2020/2016, 329025/PnH/2016; Warsaw University of Technology within the Excellence Initiative: Research University (IDUB) programme; Polish Ministry of Science and Higher Education (decision no. 7054/IA/SP/2020 of 28 August 2020).