Open-Source Machine Learning in Computational Chemistry

J Chem Inf Model. 2023 Aug 14;63(15):4505-4532. doi: 10.1021/acs.jcim.3c00643. Epub 2023 Jul 19.

Abstract

The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Computational Chemistry*
  • Machine Learning
  • Software*