Random-forest model for drug-target interaction prediction via Kullbeck-Leibler divergence

J Cheminform. 2022 Oct 3;14(1):67. doi: 10.1186/s13321-022-00644-1.

Abstract

Virtual screening has significantly improved the success rate of early stage drug discovery. Recent virtual screening methods have improved owing to advances in machine learning and chemical information. Among these advances, the creative extraction of drug features is important for predicting drug-target interaction (DTI), which is a large-scale virtual screening of known drugs. Herein, we report Kullbeck-Leibler divergence (KLD) as a DTI feature and the feature-driven classification model applicable to DTI prediction. For the purpose, E3FP three-dimensional (3D) molecular fingerprints of drugs as a molecular representation allow the computation of 3D similarities between ligands within each target (Q-Q matrix) to identify the uniqueness of pharmacological targets and those between a query and a ligand (Q-L vector) in DTIs. The 3D similarity matrices are transformed into probability density functions via kernel density estimation as a nonparametric estimation. Each density model can exploit the characteristics of each pharmacological target and measure the quasi-distance between the ligands. Furthermore, we developed a random forest model from the KLD feature vectors to successfully predict DTIs for representative 17 targets (mean accuracy: 0.882, out-of-bag score estimate: 0.876, ROC AUC: 0.990). The method is applicable for 2D chemical similarity.

Keywords: 3D Molecular Fingerprint; 3D Similarity; Chemocentric; Drug–Target Interaction Feature; Kullbeck–Leibler Divergence; Machine Learning; Nonparametric Density Estimation.