YummyData: providing high-quality open life science data

Yasunori Yamamoto; Atsuko Yamaguchi; Andrea Splendiani

doi:10.1093/database/bay022

YummyData: providing high-quality open life science data

Database (Oxford). 2018 Jan 1:2018:bay022. doi: 10.1093/database/bay022.

Authors

Yasunori Yamamoto¹, Atsuko Yamaguchi¹, Andrea Splendiani²

Affiliations

¹ Database Center for Life Science, Research Organization of Information and Systems, Kashiwa, Japan.
² Novartis Institutes for Biomedical Research, Basel, Switzerland.

Abstract

Many life science datasets are now available via Linked Data technologies, meaning that they are represented in a common format (the Resource Description Framework), and are accessible via standard APIs (SPARQL endpoints). While this is an important step toward developing an interoperable bioinformatics data landscape, it also creates a new set of obstacles, as it is often difficult for researchers to find the datasets they need. Different providers frequently offer the same datasets, with different levels of support: as well as having more or less up-to-date data, some providers add metadata to describe the content, structures, and ontologies of the stored datasets while others do not. We currently lack a place where researchers can go to easily assess datasets from different providers in terms of metrics such as service stability or metadata richness. We also lack a space for collecting feedback and improving data providers’ awareness of user needs. To address this issue, we have developed YummyData, which consists of two components. One periodically polls a curated list of SPARQL endpoints, monitoring the states of their Linked Data implementations and content. The other presents the information measured for the endpoints and provides a forum for discussion and feedback. YummyData is designed to improve the findability and reusability of life science datasets provided as Linked Data and to foster its adoption. It is freely accessible at http://yummydata.org/. Database URL: http://yummydata.org/

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Biological Ontologies*
Computational Biology*
Data Curation*
Databases, Factual*
Metadata*