Q-raKtion: A Semiautomated KNIME Workflow for Bioactivity Data Points Curation

J Chem Inf Model. 2022 Dec 26;62(24):6309-6315. doi: 10.1021/acs.jcim.2c01199. Epub 2022 Nov 28.

Abstract

The recent increase of bioactivity data freely available to the scientific community and stored as activity data points in chemogenomic repositories provides a huge amount of ready-to-use information to support the development of predictive models. However, the benefits provided by the availability of such a vast amount of accessible information are strongly counteracted by the lack of uniformity and consistency of data from multiple sources, requiring a process of integration and harmonization. While different automated pipelines for processing and assessing chemical data have emerged in the last years, the curation of bioactivity data points is a less investigated topic, with useful concepts provided but no tangible tools available. In this context, the present work represents a first step toward the filling of this gap, by providing a tool to meet the needs of end-user in building proprietary high-quality data sets for further studies. Specifically, we herein describe Q-raKtion, a systematic, semiautomated, flexible, and, above all, customizable KNIME workflow that effectively aggregates information on biological activities of compounds retrieved by two of the most comprehensive and widely used repositories, PubChem and ChEMBL.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Accuracy*
  • Workflow