Data shaving: a focused screening approach

J Chem Inf Comput Sci. 2004 Mar-Apr;44(2):470-9. doi: 10.1021/ci030025s.

Abstract

The number of compounds available for evaluation as part of the drug discovery process continues to increase. These compounds may exist physically or be stored electronically allowing screening by either actual or virtual means. This growing number of compounds has generated an increasing need for effective strategies to direct screening efforts. Initial efforts toward this goal led to the development of methods to select diverse sets of compounds for screening, methods to cluster actives into related groups of compounds, and tools to select compounds similar to actives of interest for further screening. In this work we extend these earlier efforts to exploit information about inactive compounds to help make rational decisions about which sets of compounds to include as part of a continuing screening campaign, or as part of a focused follow-up effort. This method uses the information from inactive compounds to "shave" off or deprioritize compounds similar to inactives from further consideration. This methodology can be used in two ways: first, to provide a rational means of deciding when sufficient compounds containing certain structural features have been tested and second as a tool to enhance similarity searching around known actives. Similarity searching is improved by deprioritizing compounds predicted to be inactive, due to the presence of structural features associated with inactivity.