StackBRAF: A Large-Scale Stacking Ensemble Learning for BRAF Affinity Prediction

Nur Fadhilah Syahid; Natthida Weerapreeyakul; Tarapong Srisongkram

doi:10.1021/acsomega.3c01641

StackBRAF: A Large-Scale Stacking Ensemble Learning for BRAF Affinity Prediction

ACS Omega. 2023 Jun 1;8(23):20881-20891. doi: 10.1021/acsomega.3c01641. eCollection 2023 Jun 13.

Authors

Nur Fadhilah Syahid¹, Natthida Weerapreeyakul^{2

3}, Tarapong Srisongkram^{2

3}

Affiliations

¹ Graduate School in the Program of Pharmaceutical Chemistry and Natural Products, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40002, Thailand.
² Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40002, Thailand.
³ Human High Performance and Health Promotion Research Institute, Khon Kaen University, Khon Kaen 40002, Thailand.

Abstract

The B-rapidly accelerated fibrosarcoma (BRAF) is a proto-oncogene that plays a vital role in cell signaling and growth regulation. Identifying a potent BRAF inhibitor can enhance therapeutic success in high-stage cancers, particularly metastatic melanoma. In this study, we proposed a stacking ensemble learning framework for the accurate prediction of BRAF inhibitors. We obtained 3857 curated molecules with BRAF inhibitory activity expressed as a predicted half-maximal inhibitory concentration value (pIC₅₀) from the ChEMBL database. Twelve molecular fingerprints from PaDeL-Descriptor were calculated for model training. Three machine learning algorithms including extreme gradient boosting, support vector regression, and multilayer perceptron were utilized for constructing new predictive features (PFs). The meta-ensemble random forest regression, called StackBRAF, was created based on the 36 PFs. The StackBRAF model achieves lower mean absolute error (MAE) and higher coefficient of determination (R² and Q²) than the individual baseline models. The stacking ensemble learning model provides good y-randomization results, indicating a strong correlation between molecular features and pIC₅₀. An applicability domain of the model with an acceptable Tanimoto similarity score was also defined. Moreover, a large-scale high-throughput screening of 2123 FDA-approved drugs against the BRAF protein was successfully demonstrated using the StackBRAF algorithm. Thus, the StackBRAF model proved beneficial as a drug design algorithm for BRAF inhibitor drug discovery and drug development.