Dimension reduction for classification with gene expression microarray data

Stat Appl Genet Mol Biol. 2006:5:Article6. doi: 10.2202/1544-6115.1147. Epub 2006 Feb 24.

Abstract

An important application of gene expression microarray data is classification of biological samples or prediction of clinical and other outcomes. One necessary part of multivariate statistical analysis in such applications is dimension reduction. This paper provides a comparison study of three dimension reduction techniques, namely partial least squares (PLS), sliced inverse regression (SIR) and principal component analysis (PCA), and evaluates the relative performance of classification procedures incorporating those methods. A five-step assessment procedure is designed for the purpose. Predictive accuracy and computational efficiency of the methods are examined. Two gene expression data sets for tumor classification are used in the study.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Colonic Neoplasms / classification
  • Colonic Neoplasms / genetics
  • Colonic Neoplasms / metabolism
  • Gene Expression Profiling / methods*
  • Least-Squares Analysis
  • Leukemia / classification
  • Leukemia / genetics
  • Leukemia / metabolism
  • Logistic Models
  • Neoplasms / classification*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Principal Component Analysis
  • Regression Analysis