High-Performance Prediction of Human Estrogen Receptor Agonists Based on Chemical Structures

Molecules. 2017 Apr 23;22(4):675. doi: 10.3390/molecules22040675.

Abstract

Many agonists for the estrogen receptor are known to disrupt endocrine functioning. We have developed a computational model that predicts agonists for the estrogen receptor ligand-binding domain in an assay system. Our model was entered into the Tox21 Data Challenge 2014, a computational toxicology competition organized by the National Center for Advancing Translational Sciences. This competition aims to find high-performance predictive models for various adverse-outcome pathways, including the estrogen receptor. Our predictive model, which is based on the random forest method, delivered the best performance in its competition category. In the current study, the predictive performance of the random forest models was improved by strictly adjusting the hyperparameters to avoid overfitting. The random forest models were optimized from 4000 descriptors simultaneously applied to 10,000 activity assay results for the estrogen receptor ligand-binding domain, which have been measured and compiled by Tox21. Owing to the correlation between our model's and the challenge's results, we consider that our model currently possesses the highest predictive power on agonist activity of the estrogen receptor ligand-binding domain. Furthermore, analysis of the optimized model revealed some important features of the agonists, such as the number of hydroxyl groups in the molecules.

Keywords: QSAR prediction model; Tox21 data challenge 2014; estrogen receptor; machine learning; random forest.

MeSH terms

  • Area Under Curve
  • Decision Trees
  • Estradiol Congeners / chemistry*
  • Humans
  • Machine Learning
  • Models, Chemical
  • Quantitative Structure-Activity Relationship
  • ROC Curve
  • Receptors, Estrogen / chemistry*

Substances

  • Estradiol Congeners
  • Receptors, Estrogen