Application of survival analysis methodology to the quantitative analysis of LC-MS proteomics data

Bioinformatics. 2012 Aug 1;28(15):1998-2003. doi: 10.1093/bioinformatics/bts306. Epub 2012 May 24.

Abstract

Motivation: Protein abundance in quantitative proteomics is often based on observed spectral features derived from liquid chromatography mass spectrometry (LC-MS) or LC-MS/MS experiments. Peak intensities are largely non-normal in distribution. Furthermore, LC-MS-based proteomics data frequently have large proportions of missing peak intensities due to censoring mechanisms on low-abundance spectral features. Recognizing that the observed peak intensities detected with the LC-MS method are all positive, skewed and often left-censored, we propose using survival methodology to carry out differential expression analysis of proteins. Various standard statistical techniques including non-parametric tests such as the Kolmogorov-Smirnov and Wilcoxon-Mann-Whitney rank sum tests, and the parametric survival model and accelerated failure time-model with log-normal, log-logistic and Weibull distributions were used to detect any differentially expressed proteins. The statistical operating characteristics of each method are explored using both real and simulated datasets.

Results: Survival methods generally have greater statistical power than standard differential expression methods when the proportion of missing protein level data is 5% or more. In particular, the AFT models we consider consistently achieve greater statistical power than standard testing procedures, with the discrepancy widening with increasing missingness in the proportions.

Availability: The testing procedures discussed in this article can all be performed using readily available software such as R. The R codes are provided as supplemental materials.

Contact: ctekwe@stat.tamu.edu.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Chromatography, Liquid / methods
  • Computer Simulation
  • Diabetes Mellitus / metabolism
  • Humans
  • Likelihood Functions
  • Mass Spectrometry / methods
  • Models, Statistical*
  • Proteins / analysis*
  • Proteomics / methods*
  • Software*
  • Statistics, Nonparametric*
  • Tandem Mass Spectrometry / methods

Substances

  • Proteins