Prediction of Micronucleus Assay Outcome Using In Vivo Activity Data and Molecular Structure Features

Appl Biochem Biotechnol. 2021 Dec;193(12):4018-4034. doi: 10.1007/s12010-021-03720-8. Epub 2021 Oct 20.

Abstract

In vivo micronucleus assay is the widely used genotoxic test to determine the extent of chromosomal aberrations caused by the chemicals in human beings, which plays a significant role in the drug discovery paradigm. To reduce the uncertainties of the in vivo experiments and the expenses, we intended to develop novel machine learning-based tools to predict the toxicity of the compounds with high precision. A total of 372 compounds with known toxicity information were retrieved from the PubChem Bioassay database and literature. The fingerprints and descriptors of the compounds were generated using PaDEL and ChemSAR, respectively, for the analysis. The performance of the models was assessed using the three tires of evaluation strategies such as fivefold, tenfold, and validation by external dataset. Further, structural alerts causing genotoxicity of the compounds were identified using SARpy method. Of note, fingerprint-based random forest model built in our analysis is able to demonstrate the highest accuracy of about 0.97 during tenfold cross-validation. In essence, our study highlights that structural alerts such as chlorocyclohexane and trimethylamine are likely to be the leading cause of toxicity in humans. Indeed, we believe that random forest model generated in this study is appropriate for reduction of test animals and should be considered in the future for the good practice of animal welfare.

Keywords: Descriptors; Fingerprints; Machine learning; Structural alerts; Toxicity prediction.

MeSH terms

  • Animals
  • Biological Assay*
  • Computer Simulation*
  • Databases, Factual*
  • Humans
  • Machine Learning*
  • Micronucleus Tests*
  • Models, Biological*
  • Molecular Structure