Toward a Sample Metadata Standard in Public Proteomics Repositories

J Proteome Res. 2020 Oct 2;19(10):3906-3909. doi: 10.1021/acs.jproteome.0c00376. Epub 2020 Sep 22.

Abstract

Metadata is essential in proteomics data repositories and is crucial to interpret and reanalyze the deposited data sets. For every proteomics data set, we should capture at least three levels of metadata: (i) data set description, (ii) the sample to data files related information, and (iii) standard data file formats (e.g., mzIdentML, mzML, or mzTab). While the data set description and standard data file formats are supported by all ProteomeXchange partners, the information regarding the sample to data files is mostly missing. Recently, members of the European Bioinformatics Community for Mass Spectrometry (EuBIC) have created an open-source project called Sample to Data file format for Proteomics (https://github.com/bigbio/proteomics-metadata-standard/) to enable the standardization of sample metadata of public proteomics data sets. Here, the project is presented to the proteomics community, and we call for contributors, including researchers, journals, and consortiums to provide feedback about the format. We believe this work will improve reproducibility and facilitate the development of new tools dedicated to proteomics data analysis.

Keywords: bioinformatics; data reanalysis; data repositories; experimental design; multiomics; open data; proteomeXchange; proteomics; reproducibility; sample metadata; standards.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Mass Spectrometry
  • Metadata*
  • Proteomics*
  • Reproducibility of Results
  • Software