Parsimonious selection of useful genes in microarray gene expression data

Adv Exp Med Biol. 2011:696:45-55. doi: 10.1007/978-1-4419-7046-6_5.

Abstract

Machine learning methods have of late made significant efforts to solving multidisciplinary problems in the field of cancer classification in microarray gene expression data. These tasks are characterized by a large number of features and a few observations, making the modeling a nontrivial undertaking. In this study, we apply entropic filter methods for gene selection, in combination with several off-the-shelf classifiers. The introduction of bootstrap resampling techniques permits the achievement of more stable performance estimates. Our findings show that the proposed methodology permits a drastic reduction in dimension, offering attractive solutions in terms of both prediction accuracy and number of explanatory genes; a dimensionality reduction technique preserving discrimination capabilities is used for visualization of the selected genes.

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Computational Biology
  • Data Mining
  • Databases, Genetic
  • Diagnosis, Differential
  • Female
  • Gene Expression Profiling / statistics & numerical data*
  • Humans
  • Male
  • Neoplasms / classification
  • Neoplasms / diagnosis
  • Neoplasms / genetics*
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*