An Integrated Feature Selection Algorithm for Cancer Classification using Gene Expression Data

Comb Chem High Throughput Screen. 2018;21(9):631-645. doi: 10.2174/1386207322666181220124756.

Abstract

Aim and objective: Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance.

Materials and methods: In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test.

Results: The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods.

Conclusion: Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.

Keywords: Cancer classification; correlation-based feature selection; gene expression data; multi-objective evolutionary algorithm; redial base function neural network..

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Biomarkers, Tumor / genetics
  • Databases, Genetic
  • Gene Expression Profiling
  • Gene Expression*
  • Humans
  • Neoplasms / classification*
  • Neoplasms / genetics
  • Neural Networks, Computer
  • Oligonucleotide Array Sequence Analysis

Substances

  • Biomarkers, Tumor