In Silico Study of In Vitro GPCR Assays by QSAR Modeling

Methods Mol Biol. 2016:1425:361-81. doi: 10.1007/978-1-4939-3609-0_16.

Abstract

The US EPA's ToxCast program is screening thousands of chemicals of environmental interest in hundreds of in vitro high-throughput screening (HTS) assays. One goal is to prioritize chemicals for more detailed analyses based on activity in assays that target molecular initiating events (MIEs) of adverse outcome pathways (AOPs). However, the chemical space of interest for environmental exposure is much wider than ToxCast's chemical library. In silico methods such as quantitative structure-activity relationships (QSARs) are proven and cost-effective approaches to predict biological activity for untested chemicals. However, empirical data is needed to build and validate QSARs. ToxCast has developed datasets for about 2000 chemicals ideal for training and testing QSAR models. The overall goal of the present work was to develop QSAR models to fill the data gaps in larger environmental chemical lists. The specific aim of the current work was to build QSAR models for 18 G-protein-coupled receptor (GPCR) assays, part of the aminergic family. Two QSAR modeling strategies were adopted: classification models were developed to separate chemicals into active/non-active classes, and then regression models were built to predict the potency values of the bioassays for the active chemicals. Multiple software programs were used to calculate constitutional, topological, and substructural molecular descriptors from two-dimensional (2D) chemical structures. Model-fitting methods included PLSDA (partial least square discriminant analysis), SVMs (support vector machines), kNNs (k-nearest neighbors), and PLSs (partial least squares). Genetic algorithms (GAs) were applied as a variable selection technique to select the most predictive molecular descriptors for each assay. N-fold cross-validation (CV) coupled with multi-criteria decision-making fitting criteria was used to evaluate the models. Finally, the models were applied to make predictions within the established chemical space limits. The most accurate model was for the bovine nonselective dopamine receptor (bDR_NS) GPCR assay, for which the classification balanced accuracy reached 0.96 in fitting and 0.95 in fivefold CV, with only two latent variables. These results demonstrate the accuracy of QSAR models to predict the biological activity of chemicals specifically for each one of the studied assays.

Keywords: GPCR; Machine learning; QSAR; ToxCast; Toxicity.

MeSH terms

  • Animals
  • Cell-Free System
  • Computer Simulation
  • Humans
  • In Vitro Techniques
  • Least-Squares Analysis
  • Models, Molecular
  • Quantitative Structure-Activity Relationship
  • Receptors, G-Protein-Coupled / chemistry*
  • Software
  • Support Vector Machine
  • Toxicity Tests / methods*

Substances

  • Receptors, G-Protein-Coupled