Reaction Data Curation I: Chemical Structures and Transformations Standardization

Mol Inform. 2021 Dec;40(12):e2100119. doi: 10.1002/minf.202100119. Epub 2021 Aug 24.

Abstract

The quality of experimental data for chemical reactions is a critical consideration for any reaction-driven study. However, the curation of reaction data has not been extensively discussed in the literature so far. Here, we suggest a 4 steps protocol that includes the curation of individual structures (reactants and products), chemical transformations, reaction conditions and endpoints. Its implementation in Python3 using CGRTools toolkit has been used to clean three popular reaction databases Reaxys, USPTO and Pistachio. The curated USPTO database is available in the GitHub repository (Laboratoire-de-Chemoinformatique/Reaction_Data_Cleaning).

Keywords: Pistachio; Reaxys; USPTO; big data; chemical reactions; data cleaning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Curation*
  • Databases, Factual
  • Reference Standards