Assessing Deep and Shallow Learning Methods for Quantitative Prediction of Acute Chemical Toxicity

Ruifeng Liu; Michael Madore; Kyle P Glover; Michael G Feasel; Anders Wallqvist

doi:10.1093/toxsci/kfy111

Assessing Deep and Shallow Learning Methods for Quantitative Prediction of Acute Chemical Toxicity

Toxicol Sci. 2018 Aug 1;164(2):512-526. doi: 10.1093/toxsci/kfy111.

Authors

Ruifeng Liu¹, Michael Madore¹, Kyle P Glover^{2

3}, Michael G Feasel³, Anders Wallqvist¹

Affiliations

¹ Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland 21702.
² Defense Threat Reduction Agency, Ft Belvoir, Virginia 22060.
³ U.S. Army - Edgewood Chemical Biological Center, Operational Toxicology, Aberdeen Proving Ground, Maryland 21010.

PMID: 29722883
DOI: 10.1093/toxsci/kfy111

Abstract

Animal-based methods for assessing chemical toxicity are struggling to meet testing demands. In silico approaches, including machine-learning methods, are promising alternatives. Recently, deep neural networks (DNNs) were evaluated and reported to outperform other machine-learning methods for quantitative structure-activity relationship modeling of molecular properties. However, most of the reported performance evaluations relied on global performance metrics, such as the root mean squared error (RMSE) between the predicted and experimental values of all samples, without considering the impact of sample distribution across the activity spectrum. Here, we carried out an in-depth analysis of DNN performance for quantitative prediction of acute chemical toxicity using several datasets. We found that the overall performance of DNN models on datasets of up to 30 000 compounds was similar to that of random forest (RF) models, as measured by the RMSE and correlation coefficients between the predicted and experimental results. However, our detailed analyses demonstrated that global performance metrics are inappropriate for datasets with a highly uneven sample distribution, because they show a strong bias for the most populous compounds along the toxicity spectrum. For highly toxic compounds, DNN and RF models trained on all samples performed much worse than the global performance metrics indicated. Surprisingly, our variable nearest neighbor method, which utilizes only structurally similar compounds to make predictions, performed reasonably well, suggesting that information of close near neighbors in the training sets is a key determinant of acute toxicity predictions.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Animals
Computational Biology / methods*
Datasets as Topic
Deep Learning*
Machine Learning*
Mice
Neural Networks, Computer
Quantitative Structure-Activity Relationship
Rabbits
Rats
Toxicity Tests / methods*