Virtual screening system for finding structurally diverse hits by active learning

J Chem Inf Model. 2008 Apr;48(4):930-40. doi: 10.1021/ci700085q. Epub 2008 Mar 20.

Abstract

Two virtual screening strategies, "query by bagging" (QBag) and "query by bagging with descriptor-sampling" (QBagDS), based on active learning were devised. The QBag strategy generates multiple structure-activity relationship rules by bagging and selects compounds to improve the rules. To find many structurally diverse hits, the QBagDS strategy generates rules by bagging with descriptor sampling. They can also use prior knowledge about hits to improve the efficiency at the beginning of screening. We performed simulation experiments and clustering analysis for several G-protein coupled receptors and showed that the QBag and QBagDS strategies outperform the conventional similarity-based strategy and that using both descriptor sampling and prior knowledge are effective for finding many hits. We applied the bagging with descriptor sampling strategy to novel hit finding, and 4 of the 10 selected compounds showed high inhibition.