A multicriteria decision making approach for estimating the number of clusters in a data set

PLoS One. 2012;7(7):e41713. doi: 10.1371/journal.pone.0041713. Epub 2012 Jul 27.

Abstract

Determining the number of clusters in a data set is an essential yet difficult step in cluster analysis. Since this task involves more than one criterion, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper proposes a multiple criteria decision making (MCDM)-based approach to estimate the number of clusters for a given data set. In this approach, MCDM methods consider different numbers of clusters as alternatives and the outputs of any clustering algorithm on validity measures as criteria. The proposed method is examined by an experimental study using three MCDM methods, the well-known clustering algorithm--k-means, ten relative measures, and fifteen public-domain UCI machine learning data sets. The results show that MCDM methods work fairly well in estimating the number of clusters in the data and outperform the ten relative measures considered in the study.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Cluster Analysis
  • Decision Making*

Grants and funding

This research has been partially supported by grants from the National Natural Science Foundation of China (#70901011 and #71173028 for YP, #70901015 for GK, and #70921061 for YS), and Program for New Century Excellent Talents in University (NCET-10-0293). No additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.