A deep neural network-based approach for prediction of mutagenicity of compounds

Environ Sci Pollut Res Int. 2021 Sep;28(34):47641-47650. doi: 10.1007/s11356-021-14028-9. Epub 2021 Apr 24.

Abstract

We are exposed to various chemical compounds present in the environment, cosmetics, and drugs almost every day. Mutagenicity is a valuable property that plays a significant role in establishing a chemical compound's safety. Exposure and handling of mutagenic chemicals in the environment pose a high health risk; therefore, identification and screening of these chemicals are essential. Considering the time constraints and the pressure to avoid laboratory animals' use, the shift to alternative methodologies that can establish a rapid and cost-effective detection without undue over-conservation seems critical. In this regard, computational detection and identification of the mutagens in environmental samples like drugs, pesticides, dyes, reagents, wastewater, cosmetics, and other substances is vital. From the last two decades, there have been numerous efforts to develop the prediction models for mutagenicity, and by far, machine learning methods have demonstrated some noteworthy performance and reliability. However, the accuracy of such prediction models has always been one of the major concerns for the researchers working in this area. The mutagenicity prediction models were developed using deep neural network (DNN), support vector machine, k-nearest neighbor, and random forest. The developed classifiers were based on 3039 compounds and validated on 1014 compounds; each of them encoded with 1597 molecular feature vectors. DNN-based prediction model yielded highest prediction accuracy of 92.95% and 83.81% with the training and test data, respectively. The area under the receiver's operating curve and precision-recall curve values were found to be 0.894 and 0.838, respectively. The DNN-based classifier not only fits the data with better performance as compared to traditional machine learning algorithms, viz., support vector machine, k-nearest neighbor, and random forest (with and without feature reduction) but also yields better performance metrics. In current work, we propose a DNN-based model to predict mutagenicity of compounds.

Keywords: Deep learning; Deep neural network; Environmental exposure; Machine learning; Mutagen; Prediction.

MeSH terms

  • Animals
  • Machine Learning
  • Mutagens* / toxicity
  • Neural Networks, Computer*
  • Reproducibility of Results
  • Support Vector Machine

Substances

  • Mutagens