Toward fully automated high performance computing drug discovery: a massively parallel virtual screening pipeline for docking and molecular mechanics/generalized Born surface area rescoring to improve enrichment

J Chem Inf Model. 2014 Jan 27;54(1):324-37. doi: 10.1021/ci4005145. Epub 2014 Jan 3.

Abstract

In this work we announce and evaluate a high throughput virtual screening pipeline for in-silico screening of virtual compound databases using high performance computing (HPC). Notable features of this pipeline are an automated receptor preparation scheme with unsupervised binding site identification. The pipeline includes receptor/target preparation, ligand preparation, VinaLC docking calculation, and molecular mechanics/generalized Born surface area (MM/GBSA) rescoring using the GB model by Onufriev and co-workers [J. Chem. Theory Comput. 2007, 3, 156-169]. Furthermore, we leverage HPC resources to perform an unprecedented, comprehensive evaluation of MM/GBSA rescoring when applied to the DUD-E data set (Directory of Useful Decoys: Enhanced), in which we selected 38 protein targets and a total of ∼0.7 million actives and decoys. The computer wall time for virtual screening has been reduced drastically on HPC machines, which increases the feasibility of extremely large ligand database screening with more accurate methods. HPC resources allowed us to rescore 20 poses per compound and evaluate the optimal number of poses to rescore. We find that keeping 5-10 poses is a good compromise between accuracy and computational expense. Overall the results demonstrate that MM/GBSA rescoring has higher average receiver operating characteristic (ROC) area under curve (AUC) values and consistently better early recovery of actives than Vina docking alone. Specifically, the enrichment performance is target-dependent. MM/GBSA rescoring significantly out performs Vina docking for the folate enzymes, kinases, and several other enzymes. The more accurate energy function and solvation terms of the MM/GBSA method allow MM/GBSA to achieve better enrichment, but the rescoring is still limited by the docking method to generate the poses with the correct binding modes.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Binding Sites
  • Computational Biology
  • Computer-Aided Design
  • Crystallography, X-Ray
  • Databases, Chemical
  • Databases, Pharmaceutical
  • Drug Discovery / methods*
  • Drug Discovery / statistics & numerical data
  • Drug Evaluation, Preclinical / methods*
  • Drug Evaluation, Preclinical / statistics & numerical data
  • High-Throughput Screening Assays / methods*
  • High-Throughput Screening Assays / statistics & numerical data
  • Humans
  • Ligands
  • Models, Molecular
  • Molecular Dynamics Simulation
  • Proteins / chemistry
  • Proteins / metabolism
  • Software
  • User-Computer Interface*

Substances

  • Ligands
  • Proteins