Optimization Modeling of Anti - breast Cancer Candidate Drugs

Biotechnol Genet Eng Rev. 2023 Mar 24:1-19. doi: 10.1080/02648725.2023.2193484. Online ahead of print.

Abstract

To explore how to control the estrogen level in vivo by regulating the activity of the estrogen receptor in the development of breast cancer drugs, multiple-featured evaluation methods were first applied to screen the molecular descriptors of compounds according to the information of antagonist ERα provided in this study. Combining the methods of Extreme Gradient Boost (XGBoost), Light Gradient Boosting Machine (LightGBM) and Random Forest (RF), a stacking-integrated regression model for quantitatively predicting the ERα (estrogen receptors alpha) activity of breast cancer candidate drug was constructed, which considered the compounds acting on the target and their biological activity data, a series of molecular structure descriptors as the independent variables, and the biological activity values as the dependent variables. Then, three classification methods of XGBoost, LightGBM, and Gradient Boosting Decision Tree (GBDT) were selected and the voting strategy was applied to build five ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) classification prediction models. Finally, two schemes based on genetic algorithm (GA) were used to optimize the model and provide predictions for optimizing the biological activity and ADMET properties of ERα antagonists simultaneously. Results showed that the model prediction has strong practical significance, which can guide the structural optimization of existing active compounds and improve the activity of anti-breast cancer candidate drugs.

Keywords: Anti-breast Cancer; LightGBM; Stacking; XGBoost; estrogen receptors alpha.