A new hybrid algorithm for three-stage gene selection based on whale optimization

Sci Rep. 2023 Mar 7;13(1):3783. doi: 10.1038/s41598-023-30862-y.

Abstract

In biomedical data mining, the gene dimension is often much larger than the sample size. To solve this problem, we need to use a feature selection algorithm to select feature gene subsets with a strong correlation with phenotype to ensure the accuracy of subsequent analysis. This paper presents a new three-stage hybrid feature gene selection method, that combines a variance filter, extremely randomized tree, and whale optimization algorithm. First, a variance filter is used to reduce the dimension of the feature gene space, and an extremely randomized tree is used to further reduce the feature gene set. Finally, the whale optimization algorithm is used to select the optimal feature gene subset. We evaluate the proposed method with three different classifiers in seven published gene expression profile datasets and compare it with other advanced feature selection algorithms. The results show that the proposed method has significant advantages in a variety of evaluation indicators.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Data Mining
  • Phenotype
  • Sample Size
  • Whales*