Minimum number of genes for microarray feature selection

Annu Int Conf IEEE Eng Med Biol Soc. 2008:2008:5692-5. doi: 10.1109/IEMBS.2008.4650506.

Abstract

A fundamental problem in microarray analysis is to identify relevant genes from large amounts of expression data. Feature selection aims at identifying a subset of features for building robust learning models. However, finding the optimal number of features is a challenging problem, as it is a trade off between information loss when pruning excessively and noise increase when pruning is too weak. This paper presents a novel representation of genes as strings of bits and a method which automatically selects the minimum number of genes to reach a good classification accuracy on the training set. Our method first eliminates redundant features, which do not add further information for classification, then it exploits a set covering algorithm. Preliminary experimental results on public datasets confirm the intuition of the proposed method leading to high classification accuracy.

Publication types

  • Evaluation Study

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Diagnosis, Computer-Assisted / methods*
  • Gene Expression Profiling / methods*
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated / methods*
  • Reproducibility of Results
  • Sample Size
  • Sensitivity and Specificity
  • Signal Processing, Computer-Assisted