Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting

Omar Alghushairy; Farman Ali; Wajdi Alghamdi; Majdi Khalid; Raed Alsini; Othman Asiry

doi:10.1080/07391102.2023.2269280

Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting

J Biomol Struct Dyn. 2023 Oct 18:1-12. doi: 10.1080/07391102.2023.2269280. Online ahead of print.

Authors

Omar Alghushairy¹, Farman Ali², Wajdi Alghamdi³, Majdi Khalid⁴, Raed Alsini⁵, Othman Asiry⁶

Affiliations

¹ Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia.
² Department of Software Engineering, Sarhad University of Science and Information Technology Peshawar Mardan Campus, Peshawar, Pakistan.
³ Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia.
⁴ Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia.
⁵ Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia.
⁶ Department of Information Technology, College of Computing and Information Technology at Khulais, University of Jeddah, Jeddah, Saudi Arabia.

PMID: 37850427
DOI: 10.1080/07391102.2023.2269280

Abstract

The identification of druggable proteins (DPs) is significant for the development of new drugs, personalized medicine, understanding of disease mechanisms, drug repurposing, and economic benefits. By identifying new druggable targets, researchers can develop new therapies for a range of diseases, leading to better patient outcomes. Identification of DPs by machine learning strategies is more efficient and cost-effective than conventional methods. In this study, a computational predictor, namely Drug-LXGB, is introduced to enhance the identification of DPs. Features are discovered by composition, transition, and distribution (CTD), composition of K-spaced amino acid pair (CKSAAP), pseudo-position-specific scoring matrix (PsePSSM), and a novel descriptor, called multi-block pseudo amino acid composition (MB-PseAAC). The dimensions of CTD, CKSAAP, PsePSSM, and MB-PseAAC are integrated and utilized the sequential forward selection as feature selection algorithm. The best characteristics are provided by random forest, extreme gradient boosting, and light eXtreme gradient boosting (LXGB). The predictive analysis of these learning methods is measured via 10-fold cross-validation. The LXGB-based model secures the highest results than other existing predictors. Our novel protocol will perform an active role in designing novel drugs and would be fruitful to explore the potential target. This study will help better to capture a more universal view of a potential target.Communicated by Ramaswamy H. Sarma.

Keywords: Druggable proteins; light extreme gradient boosting; machine learning.