Building a global genomics observatory: Using GEOME (the Genomic Observatories Metadatabase) to expedite and improve deposition and retrieval of genetic data and metadata for biodiversity research

Mol Ecol Resour. 2020 Nov;20(6):1458-1469. doi: 10.1111/1755-0998.13269. Epub 2020 Oct 27.

Abstract

Genetic data represent a relatively new frontier for our understanding of global biodiversity. Ideally, such data should include both organismal DNA-based genotypes and the ecological context where the organisms were sampled. Yet most tools and standards for data deposition focus exclusively either on genetic or ecological attributes. The Genomic Observatories Metadatabase (GEOME: geome-db.org) provides an intuitive solution for maintaining links between genetic data sets stored by the International Nucleotide Sequence Database Collaboration (INSDC) and their associated ecological metadata. GEOME facilitates the deposition of raw genetic data to INSDCs sequence read archive (SRA) while maintaining persistent links to standards-compliant ecological metadata held in the GEOME database. This approach facilitates findable, accessible, interoperable and reusable data archival practices. Moreover, GEOME enables data management solutions for large collaborative groups and expedites batch retrieval of genetic data from the SRA. The article that follows describes how GEOME can enable genuinely open data workflows for researchers in the field of molecular ecology.

Keywords: FAIR principles; bioinformatics; ecoinformatics; genomic; open data; reproducible research.

MeSH terms

  • Biodiversity*
  • Databases, Nucleic Acid*
  • Ecology
  • Genomics*
  • Information Storage and Retrieval
  • Metadata*
  • Research*
  • Workflow