CapsCarcino: A novel sparse data deep learning tool for predicting carcinogens

Food Chem Toxicol. 2020 Jan:135:110921. doi: 10.1016/j.fct.2019.110921. Epub 2019 Oct 25.

Abstract

Determining chemical carcinogenicity in the early stages of drug discovery is fundamentally important to prevent the adverse effect of carcinogens on human health. There has been a recent surge of interest in developing computational approaches to predict chemical carcinogenicity. However, the predictive power of many existing approaches is limited, and there is plenty of room for improvement. Here, we develop a new deep learning architecture, termed CapsCarcino, to distinguish between carcinogens and noncarcinogens. CapsCarcino is constructed based on a dynamic routing algorithm that requires less data, extracts more comprehensive information, and does not require feature selection. We find that CapsCarcino provides a significantly improved predictive and generalization ability over, and outperforms five other machine learning models. Specifically, the best model of CapsCarcino achieves an accuracy of 85.0% on an external validation dataset. In addition, we discover that the enhanced predictive capability of CapsCarcino over that of the other methods is robust and can be achieved using sparse datasets. Training on merely 20% of the dataset, CapsCarcino performs comparably to the other methods based on the full training dataset. Further mechanism analysis indicates that CapsCarcino could efficiently learn the characteristics of carcinogens even if structural alerts are insufficiently represented. The results indicate that CapsCarcino should be helpful for carcinogen risk assessment.

Keywords: Capsule network; Carcinogenicity; Computational toxicology; Deep learning; Predictive classifier.

MeSH terms

  • Animals
  • Carcinogens / chemistry*
  • Databases, Chemical / statistics & numerical data
  • Deep Learning*
  • Rats

Substances

  • Carcinogens