From intuition to AI: evolution of small molecule representations in drug discovery

Miles McGibbon; Steven Shave; Jie Dong; Yumiao Gao; Douglas R Houston; Jiancong Xie; Yuedong Yang; Philippe Schwaller; Vincent Blay

doi:10.1093/bib/bbad422

From intuition to AI: evolution of small molecule representations in drug discovery

Brief Bioinform. 2023 Nov 22;25(1):bbad422. doi: 10.1093/bib/bbad422.

Authors

Miles McGibbon¹, Steven Shave¹, Jie Dong², Yumiao Gao¹, Douglas R Houston¹, Jiancong Xie³, Yuedong Yang³, Philippe Schwaller⁴, Vincent Blay¹

Affiliations

¹ Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom.
² Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, China.
³ Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China.
⁴ Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.

Abstract

Within drug discovery, the goal of AI scientists and cheminformaticians is to help identify molecular starting points that will develop into safe and efficacious drugs while reducing costs, time and failure rates. To achieve this goal, it is crucial to represent molecules in a digital format that makes them machine-readable and facilitates the accurate prediction of properties that drive decision-making. Over the years, molecular representations have evolved from intuitive and human-readable formats to bespoke numerical descriptors and fingerprints, and now to learned representations that capture patterns and salient features across vast chemical spaces. Among these, sequence-based and graph-based representations of small molecules have become highly popular. However, each approach has strengths and weaknesses across dimensions such as generality, computational cost, inversibility for generative applications and interpretability, which can be critical in informing practitioners' decisions. As the drug discovery landscape evolves, opportunities for innovation continue to emerge. These include the creation of molecular representations for high-value, low-data regimes, the distillation of broader biological and chemical knowledge into novel learned representations and the modeling of up-and-coming therapeutic modalities.

Keywords: SMILES; artificial intelligence; autoencoders; drug discovery; machine learning; transformers.

MeSH terms

Drug Discovery*
Humans
Intuition*
Learning