ALADDIN: Docking Approach Augmented by Machine Learning for Protein Structure Selection Yields Superior Virtual Screening Performance

Mol Inform. 2020 Apr;39(4):e1900103. doi: 10.1002/minf.201900103. Epub 2019 Nov 8.

Abstract

Protein flexibility and solvation pose major challenges to docking algorithms and scoring functions. One established strategy for addressing these challenges is to use multiple protein conformations for docking (all-against-all ensemble docking). Recent studies have shown that the performance of ensemble docking can be improved by selecting the most relevant protein structures for docking. In search for a robust approach to protein structure selection, we have come up with an integrated mAchine Learning AnD DockINg approach (ALADDIN). ALADDIN employs a battery of random forest classifiers to select, individually for each compound of interest, from an ensemble of protein structures, the single most suitable protein structure for docking. ALADDIN outperformed the best single-structure docking runs, ensemble docking and a similarity-based docking approach on three out of four investigated targets, with up to 0.15, 0.11 and 0.16 higher area under the receiver operating characteristic curve (AUC) values, respectively. Only in the case of cytochrome P450 3A4, ALADDIN, like any of the other tested approaches, failed to obtain decent performance. ALADDIN can be particularly useful for structure-based virtual screening of malleable proteins, including kinases, some viral enzymes and anti-targets.

Keywords: ensemble docking; machine learning; similarity-based docking; structure selection; virtual screening.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Machine Learning*
  • Molecular Docking Simulation*
  • Protein Conformation
  • Proteins / chemistry*

Substances

  • Proteins