A novel and innovative cancer classification framework through a consecutive utilization of hybrid feature selection

BMC Bioinformatics. 2023 Dec 15;24(1):479. doi: 10.1186/s12859-023-05605-5.

Abstract

Cancer prediction in the early stage is a topic of major interest in medicine since it allows accurate and efficient actions for successful medical treatments of cancer. Mostly cancer datasets contain various gene expression levels as features with less samples, so firstly there is a need to eliminate similar features to permit faster convergence rate of classification algorithms. These features (genes) enable us to identify cancer disease, choose the best prescription to prevent cancer and discover deviations amid different techniques. To resolve this problem, we proposed a hybrid novel technique CSSMO-based gene selection for cancer classification. First, we made alteration of the fitness of spider monkey optimization (SMO) with cuckoo search algorithm (CSA) algorithm viz., CSSMO for feature selection, which helps to combine the benefit of both metaheuristic algorithms to discover a subset of genes which helps to predict a cancer disease in early stage. Further, to enhance the accuracy of the CSSMO algorithm, we choose a cleaning process, minimum redundancy maximum relevance (mRMR) to lessen the gene expression of cancer datasets. Next, these subsets of genes are classified using deep learning (DL) to identify different groups or classes related to a particular cancer disease. Eight different benchmark microarray gene expression datasets of cancer have been utilized to analyze the performance of the proposed approach with different evaluation matrix such as recall, precision, F1-score, and confusion matrix. The proposed gene selection method with DL achieves much better classification accuracy than other existing DL and machine learning classification models with all large gene expression dataset of cancer.

Keywords: Cancer classification; Cuckoo search algorithm (CSA); Deep learning (DL); Minimum redundancy maximum relevance (mRMR); Spider monkey optimization (SM).

MeSH terms

  • Algorithms*
  • Genetic Techniques
  • Humans
  • Machine Learning
  • Microarray Analysis
  • Neoplasms* / genetics