ADMET Evaluation in Drug Discovery. Part 17: Development of Quantitative and Qualitative Prediction Models for Chemical-Induced Respiratory Toxicity

Tailong Lei; Fu Chen; Hui Liu; Huiyong Sun; Yu Kang; Dan Li; Youyong Li; Tingjun Hou

doi:10.1021/acs.molpharmaceut.7b00317

ADMET Evaluation in Drug Discovery. Part 17: Development of Quantitative and Qualitative Prediction Models for Chemical-Induced Respiratory Toxicity

Mol Pharm. 2017 Jul 3;14(7):2407-2421. doi: 10.1021/acs.molpharmaceut.7b00317. Epub 2017 Jun 21.

Authors

Tailong Lei¹, Fu Chen¹, Hui Liu¹, Huiyong Sun¹, Yu Kang¹, Dan Li¹, Youyong Li², Tingjun Hou^{1

3}

Affiliations

¹ College of Pharmaceutical Sciences, Zhejiang University , Hangzhou, Zhejiang 310058, P. R. China.
² Institute of Functional Nano and Soft Materials (FUNSOM), Soochow University , Suzhou, Jiangsu 215123, P. R. China.
³ State Key Lab of CAD&CG, Zhejiang University , Hangzhou, Zhejiang 310058, P. R. China.

PMID: 28595388
DOI: 10.1021/acs.molpharmaceut.7b00317

Abstract

As a dangerous end point, respiratory toxicity can cause serious adverse health effects and even death. Meanwhile, it is a common and traditional issue in occupational and environmental protection. Pharmaceutical and chemical industries have a strong urge to develop precise and convenient computational tools to evaluate the respiratory toxicity of compounds as early as possible. Most of the reported theoretical models were developed based on the respiratory toxicity data sets with one single symptom, such as respiratory sensitization, and therefore these models may not afford reliable predictions for toxic compounds with other respiratory symptoms, such as pneumonia or rhinitis. Here, based on a diverse data set of mouse intraperitoneal respiratory toxicity characterized by multiple symptoms, a number of quantitative and qualitative predictions models with high reliability were developed by machine learning approaches. First, a four-tier dimension reduction strategy was employed to find an optimal set of 20 molecular descriptors for model building. Then, six machine learning approaches were used to develop the prediction models, including relevance vector machine (RVM), support vector machine (SVM), regularized random forest (RRF), extreme gradient boosting (XGBoost), naïve Bayes (NB), and linear discriminant analysis (LDA). Among all of the models, the SVM regression model shows the most accurate quantitative predictions for the test set (q²_ext = 0.707), and the XGBoost classification model achieves the most accurate qualitative predictions for the test set (MCC of 0.644, AUC of 0.893, and global accuracy of 82.62%). The application domains were analyzed, and all of the tested compounds fall within the application domain coverage. We also examined the structural features of the compounds and important fragments with large prediction errors. In conclusion, the SVM regression model and the XGBoost classification model can be employed as accurate prediction tools for respiratory toxicity.

Keywords: dimension reduction; extreme gradient boosting; machine learning; quantitative structure−activity relationship; respiratory system toxicity.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
Bayes Theorem
Humans
Machine Learning*
Mice
Models, Theoretical
Quantitative Structure-Activity Relationship
Reproducibility of Results
Support Vector Machine