Measuring stability of feature selection in biomedical datasets

AMIA Annu Symp Proc. 2009 Nov 14:2009:406-10.

Abstract

An important step in the analysis of high-dimensional biomedical data is feature selection. Typically, a feature subset selected by a feature selection method is evaluated for relevance towards a task such as prediction or classification. Another important property of a feature selection method is stability that refers to robustness of the selected features to perturbations in the data. In biomarker discovery, for example, domain experts prefer a parsimonious subset of features that are relatively robust to slight changes in the data. We present a stability measure called the adjusted stability measure that computes robustness of a feature selection method with respect to random feature selection. This measure is useful for comparing the robustness of feature selection methods and is superior to similar measures that do not account for random feature selection. We demonstrate the application of this measure on a biomedical dataset.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bayes Theorem
  • Classification / methods*
  • Computational Biology*
  • Databases, Factual*
  • Humans
  • Logistic Models
  • Mathematical Concepts
  • Neoplasms
  • Pattern Recognition, Automated*
  • Proteomics