Frequency based feature selection method using whale algorithm

Genomics. 2019 Dec;111(6):1946-1955. doi: 10.1016/j.ygeno.2019.01.006. Epub 2019 Jan 17.

Abstract

Feature selection is the problem of finding the best subset of features which have the most impact in predicting class labels. It is noteworthy that application of feature selection is more valuable in high dimensional datasets. In this paper, a filter feature selection method has been proposed on high dimensional binary medical datasets - Colon, Central Nervous System (CNS), GLI_85, SMK_CAN_187. The proposed method incorporates three sections. First, whale algorithm has been used to discard irrelevant features. Second, the rest of features are ranked based on a frequency based heuristic approach called Mutual Congestion. Third, majority voting has been applied on best feature subsets constructed using forward feature selection with threshold τ = 10. This work provides evidence that Mutual Congestion is solely powerful to predict class labels. Furthermore, applying whale algorithm increases the overall accuracy of Mutual Congestion in most of the cases. The findings also show that the proposed method improves the prediction with selecting the less possible features in comparison with state of the arts. https://github.com/hnematzadeh.

Keywords: Feature selection; Mutual congestion; Whale algorithm.

MeSH terms

  • Algorithms*
  • Animals
  • Central Nervous System
  • Colon
  • Databases, Factual*
  • Probability
  • Support Vector Machine
  • Whales*