Modular Software for Generating and Modeling Diverse Polymer Databases

J Chem Inf Model. 2023 Jun 26;63(12):3761-3771. doi: 10.1021/acs.jcim.3c00081. Epub 2023 Jun 8.

Abstract

Machine learning methods offer the opportunity to design new functional materials on an unprecedented scale; however, building the large, diverse databases of molecules on which to train such methods remains a daunting task. Automated computational chemistry modeling workflows are therefore becoming essential tools in this data-driven hunt for new materials with novel properties, since they offer a means by which to create and curate molecular databases without requiring significant levels of user input. This ensures that well-founded concerns regarding data provenance, reproducibility, and replicability are mitigated. We have developed a versatile and flexible software package, PySoftK (Python Soft Matter at King's College London) that provides flexible, automated computational workflows to create, model, and curate libraries of polymers with minimal user intervention. PySoftK is available as an efficient, fully tested, and easily installable Python package. Key features of the software include the wide range of different polymer topologies that can be automatically generated and its fully parallelized library generation tools. It is anticipated that PySoftK will support the generation, modeling, and curation of large polymer libraries to support functional materials discovery in the nanotechnology and biotechnology arenas.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Factual
  • Humans
  • Reproducibility of Results
  • Software*