Optimal modeling of anti-breast cancer candidate drugs screening based on multi-model ensemble learning with imbalanced data

Juan Zhou; Xiong Li; Yuanting Ma; Zejiu Wu; Ziruo Xie; Yuqi Zhang; Yiming Wei

doi:10.3934/mbe.2023237

Optimal modeling of anti-breast cancer candidate drugs screening based on multi-model ensemble learning with imbalanced data

Math Biosci Eng. 2023 Jan 6;20(3):5117-5134. doi: 10.3934/mbe.2023237.

Authors

Juan Zhou¹, Xiong Li¹, Yuanting Ma², Zejiu Wu³, Ziruo Xie¹, Yuqi Zhang⁴, Yiming Wei¹

Affiliations

¹ School of Software, East China Jiaotong University, Nanchang 330013, China.
² School of Economics and Management, East China Jiaotong University, Nanchang 330013, China.
³ School of Science, East China Jiaotong University, Nanchang 330013, China.
⁴ School of Foreign Languages, East China Jiaotong University, Nanchang 330013, China.

PMID: 36896538
DOI: 10.3934/mbe.2023237

Abstract

The imbalanced data makes the machine learning model seriously biased, which leads to false positive in screening of therapeutic drugs for breast cancer. In order to deal with this problem, a multi-model ensemble framework based on tree-model, linear model and deep-learning model is proposed. Based on the methodology constructed in this study, we screened the 20 most critical molecular descriptors from 729 molecular descriptors of 1974 anti-breast cancer drug candidates and, in order to measure the pharmacokinetic properties and safety of the drug candidates, the screened molecular descriptors were used in this study for subsequent bioactivity, absorption, distribution metabolism, excretion, toxicity, and other prediction tasks. The results show that the method constructed in this study is superior and more stable than the individual models used in the ensemble approach.

Keywords: ADMET; ensemble algorithm; estrogen receptor; feature selection; imbalanced data.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Breast Neoplasms* / drug therapy
Early Detection of Cancer*
Female
Humans
Linear Models
Machine Learning