CoGT: Ensemble Machine Learning Method and Its Application on JAK Inhibitor Discovery

Yingzi Bu; Ruoxi Gao; Bohan Zhang; Luchen Zhang; Duxin Sun

doi:10.1021/acsomega.3c00160

CoGT: Ensemble Machine Learning Method and Its Application on JAK Inhibitor Discovery

ACS Omega. 2023 Mar 27;8(14):13232-13242. doi: 10.1021/acsomega.3c00160. eCollection 2023 Apr 11.

Authors

Yingzi Bu¹, Ruoxi Gao², Bohan Zhang³, Luchen Zhang¹, Duxin Sun¹

Affiliations

¹ Department of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109, United States.
² Department of Electrical Engineering and Computer Science, University of MichiganAnn Arbor, Michigan 48109, United States.
³ School of Information, University of MichiganAnn Arbor, Michigan 48109, United States.

Abstract

The discovery of new drug candidates to inhibit an intended target is a complex and resource-consuming process. A machine learning (ML) method for predicting drug-target interactions (DTI) is a potential solution to improve the efficiency. However, traditional ML approaches have limitations in accuracy. In this study, we developed a novel ensemble model CoGT for DTI prediction using multilayer perceptron (MLP), which integrated graph-based models to extract non-Euclidean molecular structures and large pretrained models, specifically chemBERTa, to process simplified molecular input line entry systems (SMILES). The performance of CoGT was evaluated using compounds inhibiting four Janus kinases (JAKs). Results showed that the large pretrained model, chemBERTa, was better than other conventional ML models in predicting DTI across multiple evaluation metrics, while the graph neural network (GNN) was effective for prediction on imbalanced data sets. To take full advantage of the strengths of these different models, we developed an ensemble model, CoGT, which outperformed other individual ML models in predicting compounds' inhibition on different isoforms of JAKs. Our data suggest that the ensemble model CoGT has the potential to accelerate the process of drug discovery.