Measuring stability of feature selection in biomedical datasets

Jonathan L Lustgarten; Vanathi Gopalakrishnan; Shyam Visweswaran

Measuring stability of feature selection in biomedical datasets

AMIA Annu Symp Proc. 2009 Nov 14:2009:406-10.

Authors

Jonathan L Lustgarten¹, Vanathi Gopalakrishnan, Shyam Visweswaran

Affiliation

¹ University of Pittsburgh Department of Biomedical Informatics, Pittsburgh, PA, USA.

PMID: 20351889
PMCID: PMC2815476

Abstract

An important step in the analysis of high-dimensional biomedical data is feature selection. Typically, a feature subset selected by a feature selection method is evaluated for relevance towards a task such as prediction or classification. Another important property of a feature selection method is stability that refers to robustness of the selected features to perturbations in the data. In biomarker discovery, for example, domain experts prefer a parsimonious subset of features that are relatively robust to slight changes in the data. We present a stability measure called the adjusted stability measure that computes robustness of a feature selection method with respect to random feature selection. This measure is useful for comparing the robustness of feature selection methods and is superior to similar measures that do not account for random feature selection. We demonstrate the application of this measure on a biomedical dataset.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Bayes Theorem
Classification / methods*
Computational Biology*
Databases, Factual*
Humans
Logistic Models
Mathematical Concepts
Neoplasms
Pattern Recognition, Automated*
Proteomics

Abstract

Publication types

MeSH terms

Grants and funding