A new population initialization of metaheuristic algorithms based on hybrid fuzzy rough set for high-dimensional gene data feature selection

Comput Biol Med. 2023 Nov:166:107538. doi: 10.1016/j.compbiomed.2023.107538. Epub 2023 Oct 4.

Abstract

In the realm of modern medicine and biology, vast amounts of genetic data with high complexity are available. However, dealing with such high-dimensional data poses challenges due to increased processing complexity and size. Identifying critical genes to reduce data dimensionality is essential. The filter-wrapper hybrid method is a commonly used approach in feature selection. Most of these methods employ filters such as MRMR and ReliefF, but the performance of these simple filters is limited. Rough set methods, on the other hand, are a type of filter method that outperforms traditional filters. Simultaneously, many studies have pointed out the crucial importance of good initialization strategies for the performance of the metaheuristic algorithm (a type of wrapper-based method). Combining these two points, this paper proposes a novel filter-wrapper hybrid method for high-dimensional feature selection. To be specific, we utilize the variant of bWOA (binary Whale Optimization Algorithm) based on Hybrid Fuzzy Rough Set to perform attribute reduction, and the reduced attributes are used as prior knowledge to initialize the population. We then employ metaheuristics for further feature selection based on this initialized population. We conducted experiments using five different algorithms on 14 UCI datasets. The experiment results show that after applying the initialization method proposed in this article, the performance of five enhanced algorithms, has shown significant improvement. Particularly, the improved bMFO using our initialization method: fuzzy_bMFO outperformed six currently advanced algorithms, indicating that our initialization method for metaheuristic algorithms is suitable for high-dimensional feature selection tasks.

Keywords: Gene data feature selection; Hybrid fuzzy rough set; Multiclass classification; Population initialization; Whale optimization algorithm.

MeSH terms

  • Algorithms*
  • Computational Biology / methods
  • Databases, Genetic
  • Fuzzy Logic*
  • Humans