Dataset containing physiological amounts of spike-in proteins into murine C2C12 background as a ground truth quantitative LC-MS/MS reference

Julian Uszkoreit; Katalin Barkovits; Sandra Pacharra; Kathy Pfeiffer; Simone Steinbach; Katrin Marcus; Martin Eisenacher

doi:10.1016/j.dib.2022.108435

Dataset containing physiological amounts of spike-in proteins into murine C2C12 background as a ground truth quantitative LC-MS/MS reference

Data Brief. 2022 Jul 4:43:108435. doi: 10.1016/j.dib.2022.108435. eCollection 2022 Aug.

Authors

Julian Uszkoreit^{1

2

3}, Katalin Barkovits^{1

2}, Sandra Pacharra⁴, Kathy Pfeiffer^{1

2}, Simone Steinbach^{1

2}, Katrin Marcus^{1

2}, Martin Eisenacher^{1

2}

Affiliations

¹ Medical Faculty, Medical Proteome Center, Ruhr University Bochum, Bochum 44801, Germany.
² Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany.
³ Institute of Bio- and Geosciences (IBG-5) - Computational Metagenomics, Forschungszentrum Jülich GmbH, ELIXIR Germany, Jülich 52425, Germany.
⁴ ProtaGene GmbH, Dortmund 44227, Germany.

Abstract

In this article, we present a data dependent acquisition (DDA) dataset which was generated as a reference and ground truth quantitative dataset. While initially used to compare samples measured with DDA and data independent acquisition (DIA) (Barkovits et al., 2020), the presented dataset holds potential value as a benchmark reference for any workflows working on DDA data. The entire dataset consists of 15 LC-MS/MS measurements composed of five distinct spike-in-states, each with three replicates. To generate the data set, a C2C12 (immortalized mouse myoblast) cell lysate was used as a complex background for five different states which were simulated by spiking 13 defined proteins at different concentrations. For this purpose, the cell lysate was used in a constant amount of 20 µg for all samples and different amounts of the 13 selected proteins ranging from 0.1 to 10 pmol were added, reflecting physiological amounts of proteins. Afterwards, all samples were tryptically digested using the same method. From each sample 200 ng tryptic peptides were measured in triplicates on a Q Exactive HF (Thermo Fisher Scientific). The mass range for MS1 was set to 350-1400 m/z with a resolution of 60,000 at 200 m/z. HCD fragmentation of the Top10 abundant precursor ions was performed at 27% NCE. The fragment analysis (MS2) was performed with a resolution of 30,000 at 200 m/z. Additionally to the raw files, the dataset contains centroided mzML files and spectrum identification results for peptide identifications performed by Mascot (Perkins et al., 1999), MS-GF+ (Kim et al., 2010) and X!Tandem (Craig and Beavis, 2004) for each separate MS analysis. The corresponding FASTA containing protein sequences as well as a combination of all identification runs performed by PIA (Uszkoreit et al., 2019, 2015) and a peptide and protein quantification performed by OpenMS (Pfeuffer et al., 2017) is included. All data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (Perez-Riverol et al., 2018) with the dataset identifier PXD012986.

Keywords: C2C12 cell line; Complex proteomics standard; Mass spectrometry; Protein spike-in dataset; Proteomics; Quantitative ground truth dataset.