On the statistical analysis of the GS-NS0 cell proteome: imputation, clustering and variability testing

Biochim Biophys Acta. 2006 Jul;1764(7):1179-87. doi: 10.1016/j.bbapap.2006.05.002. Epub 2006 May 19.

Abstract

We have undertaken two-dimensional gel electrophoresis proteomic profiling on a series of cell lines with different recombinant antibody production rates. Due to the nature of gel-based experiments not all protein spots are detected across all samples in an experiment, and hence datasets are invariably incomplete. New approaches are therefore required for the analysis of such graduated datasets. We approached this problem in two ways. Firstly, we applied a missing value imputation technique to calculate missing data points. Secondly, we combined a singular value decomposition based hierarchical clustering with the expression variability test to identify protein spots whose expression correlates with increased antibody production. The results have shown that while imputation of missing data was a useful method to improve the statistical analysis of such data sets, this was of limited use in differentiating between the samples investigated, and highlighted a small number of candidate proteins for further investigation.

MeSH terms

  • Algorithms*
  • Animals
  • Antibodies, Monoclonal / biosynthesis
  • Antibodies, Monoclonal / genetics
  • Cell Line, Tumor
  • Cluster Analysis
  • Electrophoresis, Gel, Two-Dimensional
  • Glutamate-Ammonia Ligase / biosynthesis
  • Glutamate-Ammonia Ligase / genetics
  • Image Processing, Computer-Assisted
  • Principal Component Analysis
  • Proteome / analysis*
  • Proteomics / methods
  • Proteomics / statistics & numerical data*
  • Recombinant Proteins / genetics
  • Recombinant Proteins / metabolism

Substances

  • Antibodies, Monoclonal
  • Proteome
  • Recombinant Proteins
  • Glutamate-Ammonia Ligase