Empirical Bayesian random censoring threshold model improves detection of differentially abundant proteins

Frank Koopmans; L Niels Cornelisse; Tom Heskes; Tjeerd M H Dijkstra

doi:10.1021/pr500171u

Empirical Bayesian random censoring threshold model improves detection of differentially abundant proteins

J Proteome Res. 2014 Sep 5;13(9):3871-80. doi: 10.1021/pr500171u. Epub 2014 Aug 22.

Authors

Frank Koopmans¹, L Niels Cornelisse, Tom Heskes, Tjeerd M H Dijkstra

Affiliation

¹ Department of Functional Genomics, Center for Neurogenomics and Cognitive Research, VU University , 1081 HV Amsterdam, The Netherlands.

PMID: 25102230
DOI: 10.1021/pr500171u

Abstract

A challenge in proteomics is that many observations are missing with the probability of missingness increasing as abundance decreases. Adjusting for this informative missingness is required to assess accurately which proteins are differentially abundant. We propose an empirical Bayesian random censoring threshold (EBRCT) model that takes the pattern of missingness in account in the identification of differential abundance. We compare our model with four alternatives, one that considers the missing values as missing completely at random (MCAR model), one with a fixed censoring threshold for each protein species (fixed censoring model) and two imputation models, k-nearest neighbors (IKNN) and singular value thresholding (SVTI). We demonstrate that the EBRCT model bests all alternative models when applied to the CPTAC study 6 benchmark data set. The model is applicable to any label-free peptide or protein quantification pipeline and is provided as an R script.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bayes Theorem*
Mass Spectrometry
Models, Statistical*
Proteins / analysis
Proteins / chemistry
Proteomics / methods*
ROC Curve

Substances

Proteins