Machine Learning Approaches to Investigate the Structure-Activity Relationship of Angiotensin-Converting Enzyme Inhibitors

ACS Omega. 2023 Nov 8;8(46):43500-43510. doi: 10.1021/acsomega.3c03225. eCollection 2023 Nov 21.

Abstract

Angiotensin-converting enzyme inhibitors (ACEIs) play a crucial role in treating conditions such as hypertension, heart failure, and kidney diseases. Nevertheless, the ACEIs currently available on the market are linked to a variety of adverse effects including renal insufficiency, which restricts their usage. There is thus an urgent need to optimize the currently available ACEIs. This study represents a structure-activity relationship investigation of ACEIs, employing machine learning to analyze data sets sourced from the ChEMBL database. Exploratory data analysis was performed to visualize the physicochemical properties of compounds by investigating the distributions, patterns, and statistical significance among the different bioactivity groups. Further scaffold analysis has identified 9 representative Murcko scaffolds with frequencies ≥10. Scaffold diversity has revealed that active ACEIs had more scaffold diversity than their intermediate and inactive counterparts, thereby indicating the significance of performing lead optimization on scaffolds of active ACEIs. Scaffolds 1, 3, 6, and 8 are unfavorable in comparison with scaffolds 2, 3, 5, 7, and 9. QSAR investigation of compiled data sets consisting of 549 compounds led to the selection of Mordred descriptor and Random Forest algorithm as the best model, which afforded robust model performance (accuracy: 0.981, 0.77, and 0.745; MCC: 0.972, 0.658, and 0.617 for the training set, 10-fold cross-validation set, and testing set, respectively). To enhance the model's robustness and predictability, we reduced the chemical diversity of the input compounds by using the 9 most prevalent Murcko scaffold-matched compounds (comprising a total of 168) followed by a subsequent QSAR model investigation using Mordred descriptor and extremely gradient boost algorithm (accuracy: 0.973, 0.849, and 0.823; MCC: 0.959, 0.786, and 0.742 for the training set, 10-fold cross-validation set, and testing set, respectively). Further illustration of the structure-activity relationship using SALI plots has enabled the identification of clusters of compounds that create activity cliffs. These findings, as presented in this study, contribute to the advancement of drug discovery and the optimization of ACEIs.