Benchmarking of analysis strategies for data-independent acquisition proteomics using a large-scale dataset comprising inter-patient heterogeneity

Klemens Fröhlich; Eva Brombacher; Matthias Fahrner; Daniel Vogele; Lucas Kook; Niko Pinter; Peter Bronsert; Sylvia Timme-Bronsert; Alexander Schmidt; Katja Bärenfaller; Clemens Kreutz; Oliver Schilling

doi:10.1038/s41467-022-30094-0

Benchmarking of analysis strategies for data-independent acquisition proteomics using a large-scale dataset comprising inter-patient heterogeneity

Nat Commun. 2022 May 12;13(1):2622. doi: 10.1038/s41467-022-30094-0.

Authors

Klemens Fröhlich^#^{1

2

3}, Eva Brombacher^#^{2

3

4

5}, Matthias Fahrner^{1

2

3}, Daniel Vogele^{1

2}, Lucas Kook^{6

7}, Niko Pinter¹, Peter Bronsert^{1

8

9}, Sylvia Timme-Bronsert^{1

9}, Alexander Schmidt¹⁰, Katja Bärenfaller¹¹, Clemens Kreutz^{4

5}, Oliver Schilling^{12

13

14}

Affiliations

¹ Institute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany.
² Faculty of Biology, University of Freiburg, Freiburg im Breisgau, Germany.
³ Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, Freiburg im Breisgau, Germany.
⁴ Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg im Breisgau, Germany.
⁵ Centre for Integrative Biological Signaling Studies (CIBSS), University of Freiburg, Freiburg im Breisgau, Germany.
⁶ Epidemiology, Biostatistics & Prevention Institute, University of Zurich, Zurich, Switzerland.
⁷ Institute for Data Analysis and Process Design, Zurich University of Applied Sciences, Winterthur, Switzerland.
⁸ German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany.
⁹ Tumorbank Comprehensive Cancer Center Freiburg, Medical Center University of Freiburg, Freiburg im Breisgau, Germany.
¹⁰ Proteomics Core Facility, Biozentrum, University of Basel, Basel, Switzerland.
¹¹ Swiss Institute of Allergy and Asthma Research (SIAF), University of Zurich, and Swiss Institute of Bioinformatics (SIB), Wolfgang, Switzerland.
¹² Institute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany. oliver.schilling@uniklinik-freiburg.de.
¹³ German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany. oliver.schilling@uniklinik-freiburg.de.
¹⁴ BIOSS Centre for Biological Signaling Studies, University of Freiburg, Freiburg im Breisgau, Germany. oliver.schilling@uniklinik-freiburg.de.

^# Contributed equally.

Abstract

Numerous software tools exist for data-independent acquisition (DIA) analysis of clinical samples, necessitating their comprehensive benchmarking. We present a benchmark dataset comprising real-world inter-patient heterogeneity, which we use for in-depth benchmarking of DIA data analysis workflows for clinical settings. Combining spectral libraries, DIA software, sparsity reduction, normalization, and statistical tests results in 1428 distinct data analysis workflows, which we evaluate based on their ability to correctly identify differentially abundant proteins. From our dataset, we derive bootstrap datasets of varying sample sizes and use the whole range of bootstrap datasets to robustly evaluate each workflow. We find that all DIA software suites benefit from using a gas-phase fractionated spectral library, irrespective of the library refinement used. Gas-phase fractionation-based libraries perform best against two out of three reference protein lists. Among all investigated statistical tests non-parametric permutation-based statistical tests consistently perform best.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Benchmarking*
Humans
Proteome / analysis
Proteomics* / methods
Software
Workflow

Substances

Proteome