The Experimentalist's Guide to Machine Learning for Small Molecule Design

ACS Appl Bio Mater. 2024 Feb 19;7(2):657-684. doi: 10.1021/acsabm.3c00054. Epub 2023 Aug 3.

Abstract

Initially part of the field of artificial intelligence, machine learning (ML) has become a booming research area since branching out into its own field in the 1990s. After three decades of refinement, ML algorithms have accelerated scientific developments across a variety of research topics. The field of small molecule design is no exception, and an increasing number of researchers are applying ML techniques in their pursuit of discovering, generating, and optimizing small molecule compounds. The goal of this review is to provide simple, yet descriptive, explanations of some of the most commonly utilized ML algorithms in the field of small molecule design along with those that are highly applicable to an experimentally focused audience. The algorithms discussed here span across three ML paradigms: supervised learning, unsupervised learning, and ensemble methods. Examples from the published literature will be provided for each algorithm. Some common pitfalls of applying ML to biological and chemical data sets will also be explained, alongside a brief summary of a few more advanced paradigms, including reinforcement learning and semi-supervised learning.

Keywords: QSAR; data analysis; drug design; experimentalist friendly; machine learning; small molecule design.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Machine Learning*