A conceptual framework for quality assessment and management of biodiversity data

PLoS One. 2017 Jun 28;12(6):e0178731. doi: 10.1371/journal.pone.0178731. eCollection 2017.

Abstract

The increasing availability of digitized biodiversity data worldwide, provided by an increasing number of institutions and researchers, and the growing use of those data for a variety of purposes have raised concerns related to the "fitness for use" of such data and the impact of data quality (DQ) on the outcomes of analyses, reports, and decisions. A consistent approach to assess and manage data quality is currently critical for biodiversity data users. However, achieving this goal has been particularly challenging because of idiosyncrasies inherent in the concept of quality. DQ assessment and management cannot be performed if we have not clearly established the quality needs from a data user's standpoint. This paper defines a formal conceptual framework to support the biodiversity informatics community allowing for the description of the meaning of "fitness for use" from a data user's perspective in a common and standardized manner. This proposed framework defines nine concepts organized into three classes: DQ Needs, DQ Solutions and DQ Report. The framework is intended to formalize human thinking into well-defined components to make it possible to share and reuse concepts of DQ needs, solutions and reports in a common way among user communities. With this framework, we establish a common ground for the collaborative development of solutions for DQ assessment and management based on data fitness for use principles. To validate the framework, we present a proof of concept based on a case study at the Museum of Comparative Zoology of Harvard University. In future work, we will use the framework to engage the biodiversity informatics community to formalize and share DQ profiles related to DQ needs across the community.

MeSH terms

  • Biodiversity*
  • Computational Biology

Grants and funding

The study is part of the Doctorate Degree thesis of AKV, who was supported by a Brazilian Governmental National Research Agency - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES. In addition, the author AKV was granted an exchange scholarship by the Conselho Nacional de Desenvolvimento Científico e Tecnológico – CNPq (grant #233676/2014-7). The author AMS is a professor at Universidade de São Paulo, which supported the Research Center on Biodiversity and Computing, BioComp (grant #11.1.9359.1.2). AMS was also supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (grants #308326/2010-5 and #311531/2014-8), and by São Paulo Research Foundation, FAPESP, (grant #2015/241683). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.