Feature selection algorithm based on optimized genetic algorithm and the application in high-dimensional data processing

PLoS One. 2024 May 9;19(5):e0303088. doi: 10.1371/journal.pone.0303088. eCollection 2024.

Abstract

High-dimensional data is widely used in many fields, but selecting key features from it is challenging. Feature selection can reduce data dimensionality and weaken noise interference, thereby improving model efficiency and enhancing model interpretability. In order to improve the efficiency and accuracy of high-dimensional data processing, a feature selection method based on optimized genetic algorithm is proposed in this study. The algorithm simulates the process of natural selection, searches for possible subsets of feature, and finds the subsets of feature that optimizes the performance of the model. The results show that when the value of K is less than 4 or more than 8, the recognition rate is very low. After adaptive bias filtering, 724 features are filtered to 372, and the accuracy is improved from 0.9352 to 0.9815. From 714 features to 406 Gaussian codes, the accuracy is improved from 0.9625 to 0.9754. Among all tests, the colon has the highest average accuracy, followed by small round blue cell tumor(SRBCT), lymphoma, central nervous system(CNS) and ovaries. The green curve is the best, with stable performance and a time range of 0-300. While maintaining the efficiency, it can reach 4.48 as soon as possible. The feature selection method has practical significance for high-dimensional data processing, improves the efficiency and accuracy of data processing, and provides an effective new method for high-dimensional data processing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Humans

Grants and funding

The author(s) received no specific funding for this work.