Integrating Incompatible Assay Data Sets with Deep Preference Learning

Xiaolin Sun; Ryo Tamura; Masato Sumita; Kenichi Mori; Kei Terayama; Koji Tsuda

doi:10.1021/acsmedchemlett.1c00439

Integrating Incompatible Assay Data Sets with Deep Preference Learning

ACS Med Chem Lett. 2021 Dec 29;13(1):70-75. doi: 10.1021/acsmedchemlett.1c00439. eCollection 2022 Jan 13.

Authors

Xiaolin Sun¹, Ryo Tamura^{1

2

3

4}, Masato Sumita^{3

4}, Kenichi Mori⁵, Kei Terayama^{4

6}, Koji Tsuda^{1

2

4}

Affiliations

¹ Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan.
² Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba, Ibaraki 305-0047, Japan.
³ International Center for Materials Nanoarchitectonics (WPI-MANA), National Institute for Materials Science, Tsukuba, Ibaraki 305-0047, Japan.
⁴ RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan.
⁵ Astellas Pharma Inc., Tsukuba, Ibaraki 305-8585, Japan.
⁶ Graduate School of Medical Life Science, Yokohama City University, Yokohama 230-0045, Japan.

Abstract

A large amount of bioactivity assay data is already accumulated in public databases, but the integration of these data sets for quantitative structure-activity relationship (QSAR) studies is not straightforward due to differences in experimental methods and settings. We present an efficient deep-learning-based approach called Deep Preference Data Integration (DPDI). For integrating outcome variables of different assay types, a surrogate variable is introduced, and a neural network is trained such that the total order induced by the surrogate variable is maximally consistent with given data sets. In a task of predicting efficacy of factor Xa inhibitors, DPDI successfully integrated 2959 molecules distributed in 129 assay data sets. In most of our experiments, data integration improved prediction accuracy strongly in interpolation and extrapolation tasks, indicating that DPDI is an effective tool for QSAR studies.