Inferring multi-target QSAR models with taxonomy-based multi-task learning

Lars Rosenbaum; Alexander Dörr; Matthias R Bauer; Frank M Boeckler; Andreas Zell

doi:10.1186/1758-2946-5-33

Inferring multi-target QSAR models with taxonomy-based multi-task learning

J Cheminform. 2013 Jul 11;5(1):33. doi: 10.1186/1758-2946-5-33.

Authors

Lars Rosenbaum¹, Alexander Dörr, Matthias R Bauer, Frank M Boeckler, Andreas Zell

Affiliation

¹ Center for Bioinformatics (ZBIT), University of Tübingen, Sand 1, Tübingen 72076, Germany. lars.rosenbaum@uni-tuebingen.de.

Abstract

Background: A plethora of studies indicate that the development of multi-target drugs is beneficial for complex diseases like cancer. Accurate QSAR models for each of the desired targets assist the optimization of a lead candidate by the prediction of affinity profiles. Often, the targets of a multi-target drug are sufficiently similar such that, in principle, knowledge can be transferred between the QSAR models to improve the model accuracy. In this study, we present two different multi-task algorithms from the field of transfer learning that can exploit the similarity between several targets to transfer knowledge between the target specific QSAR models.

Results: We evaluated the two methods on simulated data and a data set of 112 human kinases assembled from the public database ChEMBL. The relatedness between the kinase targets was derived from the taxonomy of the humane kinome. The experiments show that multi-task learning increases the performance compared to training separate models on both types of data given a sufficient similarity between the tasks. On the kinase data, the best multi-task approach improved the mean squared error of the QSAR models of 58 kinase targets.

Conclusions: Multi-task learning is a valuable approach for inferring multi-target QSAR models for lead optimization. The application of multi-task learning is most beneficial if knowledge can be transferred from a similar task with a lot of in-domain knowledge to a task with little in-domain knowledge. Furthermore, the benefit increases with a decreasing overlap between the chemical space spanned by the tasks.