AI/ML Models to Predict the Severity of Drug-Induced Liver Injury for Small Molecules

Chem Res Toxicol. 2023 Jul 17;36(7):1129-1139. doi: 10.1021/acs.chemrestox.3c00098. Epub 2023 Jun 9.

Abstract

Drug-induced liver injury (DILI), believed to be a multifactorial toxicity, has been a leading cause of attrition of small molecules during discovery, clinical development, and postmarketing. Identification of DILI risk early reduces the costs and cycle times associated with drug development. In recent years, several groups have reported predictive models that use physicochemical properties or in vitro and in vivo assay endpoints; however, these approaches have not accounted for liver-expressed proteins and drug molecules. To address this gap, we have developed an integrated artificial intelligence/machine learning (AI/ML) model to predict DILI severity for small molecules using a combination of physicochemical properties and off-target interactions predicted in silico. We compiled a data set of 603 diverse compounds from public databases. Among them, 164 were categorized as Most DILI (M-DILI), 245 as Less DILI (L-DILI), and 194 as No DILI (N-DILI) by the FDA. Six machine learning methods were used to create a consensus model for predicting the DILI potential. These methods include k-nearest neighbor (k-NN), support vector machine (SVM), random forest (RF), Naïve Bayes (NB), artificial neural network (ANN), logistic regression (LR), weighted average ensemble learning (WA) and penalized logistic regression (PLR). Among the analyzed ML methods, SVM, RF, LR, WA, and PLR identified M-DILI and N-DILI compounds, achieving a receiver operating characteristic area under the curve of 0.88, sensitivity of 0.73, and specificity of 0.9. Approximately 43 off-targets, along with physicochemical properties (fsp3, log S, basicity, reactive functional groups, and predicted metabolites), were identified as significant factors in distinguishing between M-DILI and N-DILI compounds. The key off-targets that we identified include: PTGS1, PTGS2, SLC22A12, PPARγ, RXRA, CYP2C9, AKR1C3, MGLL, RET, AR, and ABCC4. The present AI/ML computational approach therefore demonstrates that the integration of physicochemical properties and predicted on- and off-target biological interactions can significantly improve DILI predictivity compared to chemical properties alone.

MeSH terms

  • Artificial Intelligence
  • Bayes Theorem
  • Chemical and Drug Induced Liver Injury*
  • Databases, Factual
  • Humans
  • Machine Learning
  • Organic Anion Transporters*
  • Organic Cation Transport Proteins

Substances

  • SLC22A12 protein, human
  • Organic Anion Transporters
  • Organic Cation Transport Proteins