TSCC: Two-Stage Combinatorial Clustering for virtual screening using protein-ligand interactions and physicochemical features

BMC Genomics. 2010 Dec 2;11 Suppl 4(Suppl 4):S26. doi: 10.1186/1471-2164-11-S4-S26.

Abstract

Background: The increasing numbers of 3D compounds and protein complexes stored in databases contribute greatly to current advances in biotechnology, being employed in several pharmaceutical and industrial applications. However, screening and retrieving appropriate candidates as well as handling false positives presents a challenge for all post-screening analysis methods employed in retrieving therapeutic and industrial targets.

Results: Using the TSCC method, virtually screened compounds were clustered based on their protein-ligand interactions, followed by structure clustering employing physicochemical features, to retrieve the final compounds. Based on the protein-ligand interaction profile (first stage), docked compounds can be clustered into groups with distinct binding interactions. Structure clustering (second stage) grouped similar compounds obtained from the first stage into clusters of similar structures; the lowest energy compound from each cluster being selected as a final candidate.

Conclusion: By representing interactions at the atomic-level and including measures of interaction strength, better descriptions of protein-ligand interactions and a more specific analysis of virtual screening was achieved. The two-stage clustering approach enhanced our post-screening analysis resulting in accurate performances in clustering, mining and visualizing compound candidates, thus, improving virtual screening enrichment.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chemical Phenomena
  • Cluster Analysis*
  • Combinatorial Chemistry Techniques / methods*
  • Computer Simulation
  • Databases, Factual
  • High-Throughput Screening Assays
  • Humans
  • Ligands*
  • Protein Binding
  • Proteins / chemistry*
  • Structure-Activity Relationship

Substances

  • Ligands
  • Proteins