Gene expression feature selection for prostate cancer diagnosis using a two-phase heuristic-deterministic search strategy

IET Syst Biol. 2018 Aug;12(4):162-169. doi: 10.1049/iet-syb.2017.0044.

Abstract

Here, a two-phase search strategy is proposed to identify the biomarkers in gene expression data set for the prostate cancer diagnosis. A statistical filtering method is initially employed to remove the noisiest data. In the first phase of the search strategy, a multi-objective optimisation based on the binary particle swarm optimisation algorithm tuned by a chaotic method is proposed to select the optimal subset of genes with the minimum number of genes and the maximum classification accuracy. Finally, in the second phase of the search strategy, the cache-based modification of the sequential forward floating selection algorithm is used to find the most discriminant genes from the optimal subset of genes selected in the first phase. The results of applying the proposed algorithm on the available challenging prostate cancer data set demonstrate that the proposed algorithm can perfectly identify the informative genes such that the classification accuracy, sensitivity, and specificity of 100% are achieved with only nine biomarkers.

Keywords: available challenging prostate cancer data; biological organs; biomarkers; cancer; chaotic method; discriminant genes; feature extraction; gene expression data; gene expression feature selection; genetics; heuristic-deterministic search strategy; informative genes; multiobjective optimisation; noisiest data; optimisation; particle swarm optimisation; particle swarm optimisation algorithm; pattern classification; prostate cancer diagnosis; search problems; selection algorithm; statistical filtering method; two-phase search strategy.