Crowd-Sourced Chemistry: Considerations for Building a Standardized Database to Improve Omic Analyses

ACS Omega. 2020 Jan 9;5(2):980-985. doi: 10.1021/acsomega.9b03708. eCollection 2020 Jan 21.

Abstract

Mass spectrometry (MS) is used in multiple omics disciplines to generate large collections of data. This data enables advancements in biomedical research by providing global profiles of a given system. One of the main barriers to generating these profiles is the inability to accurately annotate omics data, especially small molecules. To complement pre-existing large databases that are not quite complete, research groups devote efforts to generating personal libraries to annotate their data. Scientific progress is impeded during the generation of these personal libraries because the data contained within them is often redundant and/or incompatible with other databases. To overcome these redundancies and incompatibilities, we propose that communal, crowd-sourced databases be curated in a standardized fashion. A small number of groups have shown this model is feasible and successful. While the needs of a specific field will dictate the functionality of a communal database, we discuss some features to consider during database development. Special emphasis is made on standardization of terminology, documentation, format, reference materials, and quality assurance practices. These standardization procedures enable a field to have higher confidence in the quality of the data within a given database. We also discuss the three conceptual pillars of database design as well as how crowd-sourcing is practiced. Generating open-source databases requires front-end effort, but the result is a well curated, high quality data set that all can use. Having a resource such as this fosters collaboration and scientific advancement.

Publication types

  • Review