A new feature selection scheme using a data distribution factor for unsupervised nominal data

IEEE Trans Syst Man Cybern B Cybern. 2008 Apr;38(2):499-509. doi: 10.1109/TSMCB.2007.914707.

Abstract

A new efficient unsupervised feature selection method is proposed to handle nominal data without data transformation. The proposed feature selection method introduces a new data distribution factor to select appropriate clusters. The proposed method combines the compactness and separation together with a newly introduced concept of singleton item. This new feature selection method considers all features globally. It is computationally inexpensive and able to deliver very promising results. Eight datasets from the University of California Irvine (UCI) machine learning repository and a high-dimensional cDNA dataset are used in this paper. The obtained results show that the proposed method is very efficient and able to deliver very reliable results.

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Cluster Analysis*
  • Computer Simulation
  • Decision Support Techniques*
  • Information Storage and Retrieval / methods*
  • Models, Statistical
  • Pattern Recognition, Automated / methods*
  • Statistical Distributions