Raising orphans from a metadata morass: A researcher's guide to re-use of public 'omics data

Priyanka Bhandary; Arun S Seetharam; Zebulun W Arendsee; Manhoi Hur; Eve Syrkin Wurtele

doi:10.1016/j.plantsci.2017.10.014

Raising orphans from a metadata morass: A researcher's guide to re-use of public 'omics data

Plant Sci. 2018 Feb:267:32-47. doi: 10.1016/j.plantsci.2017.10.014. Epub 2017 Nov 7.

Authors

Priyanka Bhandary¹, Arun S Seetharam², Zebulun W Arendsee¹, Manhoi Hur¹, Eve Syrkin Wurtele³

Affiliations

¹ Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA.
² Genome Informatics Facility, Office of Biotechnology, Iowa State University, Ames, IA 50011, USA.
³ Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA. Electronic address: mash@iastate.edu.

PMID: 29362097
DOI: 10.1016/j.plantsci.2017.10.014

Abstract

More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-'omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive research can be accelerated by meta-analysis with a study of orphan (species-specific) genes. Computational predictions are critical to infer orphan function because their coding sequences provide very few clues. The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. This metadata morass significantly diminishes the insight that can be extracted from these data. We provide tips for data submitters and users, including specific recommendations to improve metadata quality by more use of controlled vocabulary and by metadata reviews. Finally, we advocate for a unified, straightforward metadata submission and retrieval system.

Keywords: Meta-analysis; Metabolomics; Metadata; Orphan genes; Transcriptomics; ’Omics.

Publication types

Review

MeSH terms

Base Sequence*
Databases, Factual / statistics & numerical data*
Metadata / statistics & numerical data*
Plant Proteins* / genetics
RNA, Messenger* / genetics
Zea mays* / genetics

Substances

Plant Proteins
RNA, Messenger