Nature-inspired metaheuristics model for gene selection and classification of biomedical microarray data

Med Biol Eng Comput. 2022 Jun;60(6):1627-1646. doi: 10.1007/s11517-022-02555-7. Epub 2022 Apr 11.

Abstract

Identifying a small subset of informative genes from a gene expression dataset is an important process for sample classification in the fields of bioinformatics and machine learning. In this process, there are two objectives: first, to minimize the number of selected genes, and second, to maximize the classification accuracy of the used classifier. In this paper, a hybrid machine learning framework based on a nature-inspired cuckoo search (CS) algorithm has been proposed to resolve this problem. The proposed framework is obtained by incorporating the cuckoo search (CS) algorithm with an artificial bee colony (ABC) in the exploitation and exploration of the genetic algorithm (GA). These strategies are used to maintain an appropriate balance between the exploitation and exploration phases of the ABC and GA algorithms in the search process. In preprocessing, the independent component analysis (ICA) method extracts the important genes from the dataset. Then, the proposed gene selection algorithms along with the Naive Bayes (NB) classifier and leave-one-out cross-validation (LOOCV) have been applied to find a small set of informative genes that maximize the classification accuracy. To conduct a comprehensive performance study, proposed algorithms have been applied on six benchmark datasets of gene expression. The experimental comparison shows that the proposed framework (ICA and CS-based hybrid algorithm with NB classifier) performs a deeper search in the iterative process, which can avoid premature convergence and produce better results compared to the previously published feature selection algorithm for the NB classifier.

Keywords: Artificial bee colony (ABC); Cuckoo search (CS); Genetic algorithm (GA); Independent component analysis (ICA); Naïve Bayes (NB).

MeSH terms

  • Algorithms*
  • Bayes Theorem
  • Computational Biology*
  • Machine Learning
  • Microarray Analysis