Empirical Bayesian random censoring threshold model improves detection of differentially abundant proteins

J Proteome Res. 2014 Sep 5;13(9):3871-80. doi: 10.1021/pr500171u. Epub 2014 Aug 22.

Abstract

A challenge in proteomics is that many observations are missing with the probability of missingness increasing as abundance decreases. Adjusting for this informative missingness is required to assess accurately which proteins are differentially abundant. We propose an empirical Bayesian random censoring threshold (EBRCT) model that takes the pattern of missingness in account in the identification of differential abundance. We compare our model with four alternatives, one that considers the missing values as missing completely at random (MCAR model), one with a fixed censoring threshold for each protein species (fixed censoring model) and two imputation models, k-nearest neighbors (IKNN) and singular value thresholding (SVTI). We demonstrate that the EBRCT model bests all alternative models when applied to the CPTAC study 6 benchmark data set. The model is applicable to any label-free peptide or protein quantification pipeline and is provided as an R script.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem*
  • Mass Spectrometry
  • Models, Statistical*
  • Proteins / analysis
  • Proteins / chemistry
  • Proteomics / methods*
  • ROC Curve

Substances

  • Proteins