Endocrine disruption: the noise in available data adversely impacts the models' performance

SAR QSAR Environ Res. 2021 Feb;32(2):111-131. doi: 10.1080/1062936X.2020.1864468. Epub 2021 Jan 19.

Abstract

This paper is devoted to the analysis of available experimental data and preparation of predictive models for binding affinity of molecules with respect to two nuclear receptors involved in endocrine disruption (ED): the oestrogen (ER) and the androgen (AR) receptors. The ED-relevant data were retrieved from multiple sources, including the CERAPP, CoMPARA, and the Tox21 projects as well as ChEMBL and PubChem databases. Data analysis performed with the help of generative topographic mapping revealed the problem of low agreement between experimental values from different sources. Collected data were used to train both classification models for ER and AR binding activities and regression models for relative binding affinity (RBA) and median inhibition concentration (IC50). These models displayed relatively poor performance in classification (sensitivities ER = 0.34, AR = 0.49) and in regression (determination coefficient r 2 for the RBA and IC50 models in external validation varied from 0.44 to 0.76). Our analysis demonstrates that low models' performance resulted from misinterpreted experimental endpoints or wrongly reported values, thus confirming the observations reported in CERAPP and CoMPARA studies. Developed models and collected data sets included of 6215 (ER) and 3789 (AR) unique compounds, which are freely available.

Keywords: QSAR/QSPR; REACH; endocrine disruptors; generative topographic mapping; oestrogen/androgen receptor.

MeSH terms

  • Endocrine Disruptors / chemistry*
  • Humans
  • Models, Theoretical
  • Quantitative Structure-Activity Relationship*
  • Receptors, Androgen / chemistry*
  • Receptors, Estrogen / chemistry*

Substances

  • Endocrine Disruptors
  • Receptors, Androgen
  • Receptors, Estrogen