Precursor deconvolution error estimation: The missing puzzle piece in false discovery rate in top-down proteomics

Kyowon Jeong; Philipp T Kaulich; Wonhyeuk Jung; Jihyung Kim; Andreas Tholey; Oliver Kohlbacher

doi:10.1002/pmic.202300068

Precursor deconvolution error estimation: The missing puzzle piece in false discovery rate in top-down proteomics

Proteomics. 2024 Feb;24(3-4):e2300068. doi: 10.1002/pmic.202300068. Epub 2023 Nov 23.

Authors

Kyowon Jeong^{1

2}, Philipp T Kaulich³, Wonhyeuk Jung⁴, Jihyung Kim^{1

2}, Andreas Tholey³, Oliver Kohlbacher^{1

2

5}

Affiliations

¹ Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen, Germany.
² Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany.
³ Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany.
⁴ Department of Cell Biology, Yale School of Medicine, New Haven, Connecticut, USA.
⁵ Translational Bioinformatics, University Hospital Tübingen, Tübingen, Germany.

PMID: 37997224
DOI: 10.1002/pmic.202300068

Abstract

Top-down proteomics (TDP) directly analyzes intact proteins and thus provides more comprehensive qualitative and quantitative proteoform-level information than conventional bottom-up proteomics (BUP) that relies on digested peptides and protein inference. While significant advancements have been made in TDP in sample preparation, separation, instrumentation, and data analysis, reliable and reproducible data analysis still remains one of the major bottlenecks in TDP. A key step for robust data analysis is the establishment of an objective estimation of proteoform-level false discovery rate (FDR) in proteoform identification. The most widely used FDR estimation scheme is based on the target-decoy approach (TDA), which has primarily been established for BUP. We present evidence that the TDA-based FDR estimation may not work at the proteoform-level due to an overlooked factor, namely the erroneous deconvolution of precursor masses, which leads to incorrect FDR estimation. We argue that the conventional TDA-based FDR in proteoform identification is in fact protein-level FDR rather than proteoform-level FDR unless precursor deconvolution error rate is taken into account. To address this issue, we propose a formula to correct for proteoform-level FDR bias by combining TDA-based FDR and precursor deconvolution error rate.

Keywords: FDR; deconvolution; false discovery rate; precursor; top-down proteomics.

Precursor deconvolution error estimation: The missing puzzle piece in false discovery rate in top-down proteomics

Authors

Affiliations

Abstract

MeSH terms

Substances

Grants and funding