Information bottleneck based incremental fuzzy clustering for large biomedical data

J Biomed Inform. 2016 Aug:62:48-58. doi: 10.1016/j.jbi.2016.05.009. Epub 2016 May 31.

Abstract

Incremental fuzzy clustering combines advantages of fuzzy clustering and incremental clustering, and therefore is important in classifying large biomedical literature. Conventional algorithms, suffering from data sparsity and high-dimensionality, often fail to produce reasonable results and may even assign all the objects to a single cluster. In this paper, we propose two incremental algorithms based on information bottleneck, Single-Pass fuzzy c-means (spFCM-IB) and Online fuzzy c-means (oFCM-IB). These two algorithms modify conventional algorithms by considering different weights for each centroid and object and scoring mutual information loss to measure the distance between centroids and objects. spFCM-IB and oFCM-IB are used to group a collection of biomedical text abstracts from Medline database. Experimental results show that clustering performances of our approaches are better than such prominent counterparts as spFCM, spHFCM, oFCM and oHFCM, in terms of accuracy.

Keywords: Fuzzy clustering; Incremental clustering; Information bottleneck.

MeSH terms

  • Algorithms
  • Cluster Analysis*
  • Databases, Factual*
  • Fuzzy Logic*