Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories

Lennart Martens; Alexey I Nesvizhskii; Henning Hermjakob; Marcin Adamski; Gilbert S Omenn; Joël Vandekerckhove; Kris Gevaert

doi:10.1002/pmic.200401302

Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories

Proteomics. 2005 Aug;5(13):3501-5. doi: 10.1002/pmic.200401302.

Authors

Lennart Martens¹, Alexey I Nesvizhskii, Henning Hermjakob, Marcin Adamski, Gilbert S Omenn, Joël Vandekerckhove, Kris Gevaert

Affiliation

¹ Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium. lennart.martens@UGent.be

PMID: 16041670
DOI: 10.1002/pmic.200401302

Abstract

With the human Plasma Proteome Project (PPP) pilot phase completed, the largest and most ambitious proteomics experiment to date has reached its first milestone. The correspondingly impressive amount of data that came from this pilot project emphasized the need for a centralized dissemination mechanism and led to the development of a detailed, PPP specific data gathering infrastructure at the University of Michigan, Ann Arbor as well as the protein identifications database project at the European Bioinformatics Institute as a general proteomics data repository. One issue that crept up while discussing which data to store for the PPP concerns whether the raw, binary data coming from the mass spectrometers should be stored, or rather the more compact and already significantly processed peak lists. As this debate is not restricted to the PPP but relates to the proteomics community in general, we will attempt to detail the relative merits and caveats associated with centralized storage and dissemination of raw data and/or peak lists, building on the extensive experience gained during the PPP pilot phase. Finally, some suggestions are made for both immediate and future storage of MS data in public repositories.

MeSH terms

Computational Biology
Database Management Systems
Databases, Protein*
Europe
Humans
Information Storage and Retrieval
Information Systems
Internet
Mass Spectrometry / methods*
Peptide Mapping
Peptides / chemistry
Pilot Projects
Proteomics / methods*
Sequence Analysis, Protein / methods
Software
Statistics as Topic / methods*

Substances

Peptides