A general optimization protocol for molecular property prediction using a deep learning network

Jen-Hao Chen; Yufeng Jane Tseng

doi:10.1093/bib/bbab367

A general optimization protocol for molecular property prediction using a deep learning network

Brief Bioinform. 2022 Jan 17;23(1):bbab367. doi: 10.1093/bib/bbab367.

Authors

Jen-Hao Chen¹, Yufeng Jane Tseng²

Affiliations

¹ Department of Computer Science and Information Engineering, National Taiwan University, and he is an engineer with Chunghwa Telecom Co., Ltd., Taipei, Taiwan.
² Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan.

Abstract

The key to generating the best deep learning model for predicting molecular property is to test and apply various optimization methods. While individual optimization methods from different past works outside the pharmaceutical domain each succeeded in improving the model performance, better improvement may be achieved when specific combinations of these methods and practices are applied. In this work, three high-performance optimization methods in the literature that have been shown to dramatically improve model performance from other fields are used and discussed, eventually resulting in a general procedure for generating optimized CNN models on different properties of molecules. The three techniques are the dynamic batch size strategy for different enumeration ratios of the SMILES representation of compounds, Bayesian optimization for selecting the hyperparameters of a model and feature learning using chemical features obtained by a feedforward neural network, which are concatenated with the learned molecular feature vector. A total of seven different molecular properties (water solubility, lipophilicity, hydration energy, electronic properties, blood-brain barrier permeability and inhibition) are used. We demonstrate how each of the three techniques can affect the model and how the best model can generally benefit from using Bayesian optimization combined with dynamic batch size tuning.

Keywords: CNN; deep learning; drug discovery; optimization.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bayes Theorem
Deep Learning*
Neural Networks, Computer
Solubility