Synergy between machine learning and natural products cheminformatics: Application to the lead discovery of anthraquinone derivatives

Chem Biol Drug Des. 2022 Aug;100(2):185-217. doi: 10.1111/cbdd.14062. Epub 2022 May 8.

Abstract

Cheminformatics utilizing machine learning (ML) techniques have opened up a new horizon in drug discovery. This is owing to vast chemical space expansion with rocketing numbers of expected hits and lead compounds that match druggable macromolecular targets, in particular from natural compounds. Due to the natural products' (NP) structural complexity, uniqueness, and diversity, they could occupy a bigger space in pharmaceuticals, allowing the industry to pursue more selective leads in the nanomolar range of binding affinity. ML is an essential part of each step of the drug design pipeline, such as target prediction, compound library preparation, and lead optimization. Notably, molecular mechanic and dynamic simulations, induced docking, and free energy perturbations are essential in predicting best binding poses, binding free energy values, and molecular mechanics force fields. Those applications have leveraged from artificial intelligence (AI), which decreases the computational costs required for such costly simulations. This review aimed to describe chemical space and compound libraries related to NPs. High-throughput screening utilized for fractionating NPs and high-throughput virtual screening and their strategies, and significance, are reviewed. Particular emphasis was given to AI approaches, ML tools, algorithms, and techniques, especially in drug discovery of macrocyclic compounds and approaches in computer-aided and ML-based drug discovery. Anthraquinone derivatives were discussed as a source of new lead compounds that can be developed using ML tools for diverse medicinal uses such as cancer, infectious diseases, and metabolic disorders. Furthermore, the power of principal component analysis in understanding relevant protein conformations, and molecular modeling of protein-ligand interaction were also presented. Apart from being a concise reference for cheminformatics, this review is a useful text to understand the application of ML-based algorithms to molecular dynamics simulation and in silico absorption, distribution, metabolism, excretion, and toxicity prediction.

Keywords: anthraquinone; artificial intelligence; cheminformatics; deep learning; high-throughput screening; machine learning; molecular modeling; natural products; protein conformation.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Anthraquinones / pharmacology
  • Artificial Intelligence
  • Biological Products* / chemistry
  • Biological Products* / pharmacology
  • Cheminformatics
  • Machine Learning
  • Molecular Dynamics Simulation

Substances

  • Anthraquinones
  • Biological Products