A new feature selection scheme using a data distribution factor for unsupervised nominal data

Tommy W S Chow; Piyang Wang; Eden W M Ma

doi:10.1109/TSMCB.2007.914707

A new feature selection scheme using a data distribution factor for unsupervised nominal data

IEEE Trans Syst Man Cybern B Cybern. 2008 Apr;38(2):499-509. doi: 10.1109/TSMCB.2007.914707.

Authors

Tommy W S Chow¹, Piyang Wang, Eden W M Ma

Affiliation

¹ Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong. eetchow@cityu.edu.hk

PMID: 18348931
DOI: 10.1109/TSMCB.2007.914707

Abstract

A new efficient unsupervised feature selection method is proposed to handle nominal data without data transformation. The proposed feature selection method introduces a new data distribution factor to select appropriate clusters. The proposed method combines the compactness and separation together with a newly introduced concept of singleton item. This new feature selection method considers all features globally. It is computationally inexpensive and able to deliver very promising results. Eight datasets from the University of California Irvine (UCI) machine learning repository and a high-dimensional cDNA dataset are used in this paper. The obtained results show that the proposed method is very efficient and able to deliver very reliable results.

MeSH terms

Algorithms*
Artificial Intelligence*
Cluster Analysis*
Computer Simulation
Decision Support Techniques*
Information Storage and Retrieval / methods*
Models, Statistical
Pattern Recognition, Automated / methods*
Statistical Distributions