Colon cancer data analysis by chameleon algorithm

Health Inf Sci Syst. 2019 Oct 14;7(1):23. doi: 10.1007/s13755-019-0085-1. eCollection 2019 Dec.

Abstract

Detecting the key differential genes of colon cancers is very important to tell colon cancer patients from normal people. A gene selection algorithm for colon cancers is proposed by using the dynamic modeling properties of chameleon algorithm and its capability to discover any arbitrary shape clusters. This chameleon algorithm based gene selection algorithm comprises three steps. The first step is to select those genes with higher Fisher function values as candidate genes. The second step is to detect gene groups by using chameleon algorithm based on Euclidean distance. The third step is to select the most important gene from each gene cluster to comprise the gene subset by using the information index to classification of each gene. After that the chameleon algorithm is used to detect groups of colon cancer patients and normal people only with genes in gene subset. The final clustering accuracy of chameleon algorithm with the selected genes is up to 85.48%. The clustering analysis to colon cancer data and the comparisons to the other related studies demonstrate that the proposed algorithm is effective in detecting the differential genes of colon cancers.

Keywords: Chameleon algorithm; Clustering; Colon cancer; Fisher function; Gene subset selection; Information index to classification.