Reliable CA-(Q)SAR generation based on entropy weight optimized by grid search and correction factors

Comput Biol Med. 2022 Jul:146:105573. doi: 10.1016/j.compbiomed.2022.105573. Epub 2022 Apr 30.

Abstract

Chromosome aberration (CA) is a serious genotoxicity of a compound, leading to carcinogenicity and developmental side effects. In the present manuscript, we developed a QSAR model for CA prediction using artificial intelligence methodologies. The reliable QSAR model was constructed based on an enlarged data set of 3208 compounds by optimizing machine learning and deep learning algorithms based on hyperparametric iterations and using multiple descriptors of molecular fingerprint in combination with drug-like molecular properties (MP) screened by entropy weight methodology on the open-source Python platform. Furthermore, molecular similarity for returning search and molecular connection index for additional descriptor were additionally introduced to differentiate the compounds with high similarity for correct CA prediction for QSAR model generation. The final generated CA-(Q)SAR model exhibited good prediction accuracy of 80.6%. The bias of the final model is about 0.9793. On the basis of generated QSAR model, data analyses were further performed to analyze the typical structure features in numerical intervals (MPI) of molecular properties MW, XlogP, and TPSA, respectively, for potential CA or non-CA toxicity with a normalized occurrence probability (NOP) more than 70%, which may provide useful clues for drug design of leads or candidate devoid of CA genotoxicity.

Keywords: Chromosome aberration; Connection index; Machine learning; QSAR; Similarity return search.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Chromosome Aberrations
  • Entropy
  • Humans
  • Quantitative Structure-Activity Relationship*