Exploring Alternative Strategies for the Identification of Potent Compounds Using Support Vector Machine and Regression Modeling

Tomoyuki Miyao; Kimito Funatsu; Jürgen Bajorath

doi:10.1021/acs.jcim.8b00584

Exploring Alternative Strategies for the Identification of Potent Compounds Using Support Vector Machine and Regression Modeling

J Chem Inf Model. 2019 Mar 25;59(3):983-992. doi: 10.1021/acs.jcim.8b00584. Epub 2018 Dec 14.

Authors

Tomoyuki Miyao¹, Kimito Funatsu^{1

2}, Jürgen Bajorath³

Affiliations

¹ Data Science Center and Graduate School of Science and Technology , Nara Institute of Science and Technology , 8916-5 Takayama-cho , Ikoma , Nara 630-0192 , Japan.
² Department of Chemical System Engineering, School of Engineering , The University of Tokyo , 7-3-1 Hongo , Bunkyo-ku , Tokyo 113-8656 , Japan.
³ Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry , Rheinische Friedrich-Wilhelms-Universität , Endenicher Allee 19c , D-53115 Bonn , Germany.

PMID: 30547580
DOI: 10.1021/acs.jcim.8b00584

Abstract

Support vector regression (SVR) is a premier approach for the prediction of compound potency. Given the conceptual link between support vector machine (SVM) and SVR modeling, SVR is capable of accounting for continuous and discontinuous structure-activity relationships (SARs) in potency prediction, which further extends the classical quantitative SAR (QSAR) paradigm. In the context of virtual compound screening, compound potency prediction can be applied to identify the most potent compounds that are available or enrich database selection sets with potent compounds. To these ends, we have evaluated new potency prediction strategies. Conventional (direct) potency prediction using SVR was compared to two-stage SVM-SVR modeling and potency prediction using SVR models trained in the presence of active and inactive compounds, a previously unconsidered approach. The latter models were found to maximize the recall of potent compounds but were least accurate in predicting high potency values. For this purpose, direct SVR predictions were preferred. However, the best balance between accurate potency predictions and enrichment of potent compounds in database selection sets was achieved by combined SVM-SVR modeling. Taken together, our findings further extend current approaches for compound potency prediction in virtual compound screening.

MeSH terms

Drug Evaluation, Preclinical / methods*
Quantitative Structure-Activity Relationship
Regression Analysis
Support Vector Machine*