Ligand-Based Virtual Screening Based on the Graph Edit Distance

Int J Mol Sci. 2021 Nov 25;22(23):12751. doi: 10.3390/ijms222312751.

Abstract

Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets-CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS-have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.

Keywords: extended reduced graph; graph edit distance; machine learning; molecular similarity; structure activity relationships; virtual screening.

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Computer Graphics*
  • Ligands
  • Models, Theoretical*
  • User-Computer Interface*

Substances

  • Ligands