Machine Learning Based Toxicity Prediction: From Chemical Structural Description to Transcriptome Analysis

Yunyi Wu; Guanyu Wang

doi:10.3390/ijms19082358

Machine Learning Based Toxicity Prediction: From Chemical Structural Description to Transcriptome Analysis

Int J Mol Sci. 2018 Aug 10;19(8):2358. doi: 10.3390/ijms19082358.

Authors

Yunyi Wu¹, Guanyu Wang²

Affiliations

¹ Department of Biology, Guangdong Provincial Key Laboratory of Cell Microenviroment and Disease Research, Southern University of Science and Technology, Shenzhen 518055, China. wuyy3@mail.sustc.edu.cn.
² Department of Biology, Guangdong Provincial Key Laboratory of Cell Microenviroment and Disease Research, Southern University of Science and Technology, Shenzhen 518055, China. wanggy@sustc.edu.cn.

Abstract

Toxicity prediction is very important to public health. Among its many applications, toxicity prediction is essential to reduce the cost and labor of a drug's preclinical and clinical trials, because a lot of drug evaluations (cellular, animal, and clinical) can be spared due to the predicted toxicity. In the era of Big Data and artificial intelligence, toxicity prediction can benefit from machine learning, which has been widely used in many fields such as natural language processing, speech recognition, image recognition, computational chemistry, and bioinformatics, with excellent performance. In this article, we review machine learning methods that have been applied to toxicity prediction, including deep learning, random forests, k-nearest neighbors, and support vector machines. We also discuss the input parameter to the machine learning algorithm, especially its shift from chemical structural description only to that combined with human transcriptome data analysis, which can greatly enhance prediction accuracy.

Keywords: chemical structure; deep learning; machine learning; molecular fingerprint; molecular fragment; toxicity prediction; transcriptome.

Publication types

Review

MeSH terms

Animals
Gene Expression Profiling / methods*
Humans
Machine Learning*
Natural Language Processing*
Transcriptome*