Convolutional neural network scoring and minimization in the D3R 2017 community challenge

Jocelyn Sunseri; Jonathan E King; Paul G Francoeur; David Ryan Koes

doi:10.1007/s10822-018-0133-y

Convolutional neural network scoring and minimization in the D3R 2017 community challenge

J Comput Aided Mol Des. 2019 Jan;33(1):19-34. doi: 10.1007/s10822-018-0133-y. Epub 2018 Jul 10.

Authors

Jocelyn Sunseri¹, Jonathan E King¹, Paul G Francoeur¹, David Ryan Koes²

Affiliations

¹ Department of Computational & Systems Biology, School of Medicine, University of Pittsburgh, 3501 Fifth Avenue, Suite 3064, Biomedical Science Tower 3 (BST3), Pittsburgh, PA, 15260, USA.
² Department of Computational & Systems Biology, School of Medicine, University of Pittsburgh, 3501 Fifth Avenue, Suite 3064, Biomedical Science Tower 3 (BST3), Pittsburgh, PA, 15260, USA. dkoes@pitt.edu.

Abstract

We assess the ability of our convolutional neural network (CNN)-based scoring functions to perform several common tasks in the domain of drug discovery. These include correctly identifying ligand poses near and far from the true binding mode when given a set of reference receptors and classifying ligands as active or inactive using structural information. We use the CNN to re-score or refine poses generated using a conventional scoring function, Autodock Vina, and compare the performance of each of these methods to using the conventional scoring function alone. Furthermore, we assess several ways of choosing appropriate reference receptors in the context of the D3R 2017 community benchmarking challenge. We find that our CNN scoring function outperforms Vina on most tasks without requiring manual inspection by a knowledgeable operator, but that the pose prediction target chosen for the challenge, Cathepsin S, was particularly challenging for de novo docking. However, the CNN provided best-in-class performance on several virtual screening tasks, underscoring the relevance of deep learning to the field of drug discovery.

Keywords: D3R; Drug design data; Machine learning; Neural networks; Protein–ligand scoring; Virtual screening.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Binding Sites
Cathepsins / chemistry*
Databases, Protein
Drug Discovery / methods
Ligands
Molecular Docking Simulation*
Neural Networks, Computer*
Protein Binding
Protein Conformation
Structure-Activity Relationship

Substances

Ligands
Cathepsins
cathepsin S

Grants and funding

R01 GM108340/GM/NIGMS NIH HHS/United States