PREFER: A New Predictive Modeling Framework for Molecular Discovery

J Chem Inf Model. 2023 Aug 14;63(15):4497-4504. doi: 10.1021/acs.jcim.3c00523. Epub 2023 Jul 24.

Abstract

Machine-learning and deep-learning models have been extensively used in cheminformatics to predict molecular properties, to reduce the need for direct measurements, and to accelerate compound prioritization. However, different setups and frameworks and the large number of molecular representations make it difficult to properly evaluate, reproduce, and compare them. Here we present a new PREdictive modeling FramEwoRk for molecular discovery (PREFER), written in Python (version 3.7.7) and based on AutoSklearn (version 0.14.7), that allows comparison between different molecular representations and common machine-learning models. We provide an overview of the design of our framework and show exemplary use cases and results of several representation-model combinations on diverse data sets, both public and in-house. Finally, we discuss the use of PREFER on small data sets. The code of the framework is freely available on GitHub.

MeSH terms

  • Cheminformatics*
  • Machine Learning*