Harmonious Genetic Clustering

IEEE Trans Cybern. 2018 Jan;48(1):199-214. doi: 10.1109/TCYB.2016.2628722. Epub 2017 Jan 5.

Abstract

To automatically determine the number of clusters and generate more quality clusters while clustering data samples, we propose a harmonious genetic clustering algorithm, named HGCA, which is based on harmonious mating in eugenic theory. Different from extant genetic clustering methods that only use fitness, HGCA aims to select the most suitable mate for each chromosome and takes into account chromosomes gender, age, and fitness when computing mating attractiveness. To avoid illegal mating, we design three mating prohibition schemes, i.e., no mating prohibition, mating prohibition based on lineal relativeness, and mating prohibition based on collateral relativeness, and three mating strategies, i.e., greedy eugenics-based mating strategy, eugenics-based mating strategy based on weighted bipartite matching, and eugenics-based mating strategy based on unweighted bipartite matching, for harmonious mating. In particular, a novel single-point crossover operator called variable-length-and-gender-balance crossover is devised to probabilistically guarantee the balance between population gender ratio and dynamics of chromosome lengths. We evaluate the proposed approach on real-life and artificial datasets, and the results show that our algorithm outperforms existing genetic clustering methods in terms of robustness, efficiency, and effectiveness.

MeSH terms

  • Algorithms
  • Animals
  • Chromosomes / genetics
  • Cluster Analysis*
  • Computational Biology / methods*
  • Databases, Genetic
  • Female
  • Male
  • Models, Genetic*
  • Reproduction / genetics