Sparse representation for classification of tumors using gene expression data

J Biomed Biotechnol. 2009:2009:403689. doi: 10.1155/2009/403689. Epub 2009 Mar 15.

Abstract

Personalized drug design requires the classification of cancer patients as accurate as possible. With advances in genome sequencing and microarray technology, a large amount of gene expression data has been and will continuously be produced from various cancerous patients. Such cancer-alerted gene expression data allows us to classify tumors at the genomewide level. However, cancer-alerted gene expression datasets typically have much more number of genes (features) than that of samples (patients), which imposes a challenge for classification of tumors. In this paper, a new method is proposed for cancer diagnosis using gene expression data by casting the classification problem as finding sparse representations of test samples with respect to training samples. The sparse representation is computed by the l(1)-regularized least square method. To investigate its performance, the proposed method is applied to six tumor gene expression datasets and compared with various support vector machine (SVM) methods. The experimental results have shown that the performance of the proposed method is comparable with or better than those of SVMs. In addition, the proposed method is more efficient than SVMs as it has no need of model selection.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Analysis of Variance
  • Artificial Intelligence
  • Databases, Genetic*
  • Gene Expression Regulation, Neoplastic*
  • Humans
  • Least-Squares Analysis
  • Models, Biological
  • Neoplasms / classification*
  • Neoplasms / genetics*
  • Oligonucleotide Array Sequence Analysis*
  • Pattern Recognition, Automated / methods*