Chemoinformatic approaches for navigating large chemical spaces

Expert Opin Drug Discov. 2024 Apr;19(4):403-414. doi: 10.1080/17460441.2024.2313475. Epub 2024 Feb 5.

Abstract

Introduction: Large chemical spaces (CSs) include traditional large compound collections, combinatorial libraries covering billions to trillions of molecules, DNA-encoded chemical libraries comprising complete combinatorial CSs in a single mixture, and virtual CSs explored by generative models. The diverse nature of these types of CSs require different chemoinformatic approaches for navigation.

Areas covered: An overview of different types of large CSs is provided. Molecular representations and similarity metrics suitable for large CS exploration are discussed. A summary of navigation of CSs in generative models is provided. Methods for characterizing and comparing CSs are discussed.

Expert opinion: The size of large CSs might restrict navigation to specialized algorithms and limit it to considering neighborhoods of structurally similar molecules. Efficient navigation of large CSs not only requires methods that scale with size but also requires smart approaches that focus on better but not necessarily larger molecule selections. Deep generative models aim to provide such approaches by implicitly learning features relevant for targeted biological properties. It is unclear whether these models can fulfill this ideal as validation is difficult as long as the covered CSs remain mainly virtual without experimental verification.

Keywords: Chemical space; DNA-encoded chemical library; chemical space navigation; combinatorial chemistry; deep generative models; make-on-demand library; molecular representation; molecular similarity.

Publication types

  • Review

MeSH terms

  • Algorithms*
  • Cheminformatics*
  • Humans