Regularization strategies for hyperplane classifiers: application to cancer classification with gene expression data

J Bioinform Comput Biol. 2007 Feb;5(1):79-104. doi: 10.1142/s0219720007002539.

Abstract

Linear discrimination, from the point of view of numerical linear algebra, can be treated as solving an ill-posed system of linear equations. In order to generate a solution that is robust in the presence of noise, these problems require regularization. Here, we examine the ill-posedness involved in the linear discrimination of cancer gene expression data with respect to outcome and tumor subclasses. We show that a filter factor representation, based upon Singular Value Decomposition, yields insight into the numerical ill-posedness of the hyperplane-based separation when applied to gene expression data. We also show that this representation yields useful diagnostic tools for guiding the selection of classifier parameters, thus leading to improved performance.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animals
  • Artificial Intelligence
  • Biomarkers, Tumor / metabolism*
  • Diagnosis, Computer-Assisted / methods*
  • Gene Expression Profiling / methods*
  • Humans
  • Neoplasm Proteins / metabolism*
  • Neoplasms / classification
  • Neoplasms / diagnosis*
  • Neoplasms / metabolism*
  • Oligonucleotide Array Sequence Analysis / methods
  • Pattern Recognition, Automated / methods*
  • Reproducibility of Results
  • Sensitivity and Specificity

Substances

  • Biomarkers, Tumor
  • Neoplasm Proteins