Spiked proteomic standard dataset for testing label-free quantitative software and statistical methods

Claire Ramus; Agnès Hovasse; Marlène Marcellin; Anne-Marie Hesse; Emmanuelle Mouton-Barbosa; David Bouyssié; Sebastian Vaca; Christine Carapito; Karima Chaoui; Christophe Bruley; Jérôme Garin; Sarah Cianférani; Myriam Ferro; Alain Van Dorssaeler; Odile Burlet-Schiltz; Christine Schaeffer; Yohann Couté; Anne Gonzalez de Peredo

doi:10.1016/j.dib.2015.11.063

Spiked proteomic standard dataset for testing label-free quantitative software and statistical methods

Data Brief. 2015 Dec 17:6:286-94. doi: 10.1016/j.dib.2015.11.063. eCollection 2016 Mar.

Authors

Claire Ramus¹, Agnès Hovasse², Marlène Marcellin³, Anne-Marie Hesse¹, Emmanuelle Mouton-Barbosa³, David Bouyssié³, Sebastian Vaca², Christine Carapito², Karima Chaoui³, Christophe Bruley¹, Jérôme Garin¹, Sarah Cianférani², Myriam Ferro¹, Alain Van Dorssaeler², Odile Burlet-Schiltz³, Christine Schaeffer², Yohann Couté¹, Anne Gonzalez de Peredo³

Affiliations

¹ ProFi, Proteomic French Infrastructure, France; CEA, DSV, iRTSV, Laboratoire de Biologie à Grande Echelle, Grenoble F-38054, France; INSERM U1038, Grenoble F-38054, France; Université Grenoble, F-38054, France.
² ProFi, Proteomic French Infrastructure, France; Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, Université de Strasbourg, CNRS, UMR7178, 25 Rue Becquerel, 67087 Strasbourg, France.
³ ProFi, Proteomic French Infrastructure, France; CNRS UMR5089 Institut de Pharmacologie et de Biologie Structurale, 205 Route de Narbonne, 31077 Toulouse, France; Université de Toulouse, 118 Route de Narbonne, 31077 Toulouse, France.

Abstract

This data article describes a controlled, spiked proteomic dataset for which the "ground truth" of variant proteins is known. It is based on the LC-MS analysis of samples composed of a fixed background of yeast lysate and different spiked amounts of the UPS1 mixture of 48 recombinant proteins. It can be used to objectively evaluate bioinformatic pipelines for label-free quantitative analysis, and their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. More specifically, it can be useful for tuning software tools parameters, but also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. The raw MS files can be downloaded from ProteomeXchange with identifier PXD001819. Starting from some raw files of this dataset, we also provide here some processed data obtained through various bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, to exemplify the use of such data in the context of software benchmarking, as discussed in details in the accompanying manuscript [1]. The experimental design used here for data processing takes advantage of the different spike levels introduced in the samples composing the dataset, and processed data are merged in a single file to facilitate the evaluation and illustration of software tools results for the detection of variant proteins with different absolute expression levels and fold change values.