Determining the number of clusters by sampling with replacement

Psychol Methods. 2004 Jun;9(2):238-49. doi: 10.1037/1082-989X.9.2.238.

Abstract

A split-sample replication criterion originally proposed by J. E. Overall and K. N. Magee (1992) as a stopping rule for hierarchical cluster analysis is applied to multiple data sets generated by sampling with replacement from an original simulated primary data set. An investigation of the validity of this bootstrap procedure was undertaken using different combinations of the true number of latent populations, degrees of overlap, and sample sizes. The bootstrap procedure enhanced the accuracy of identifying the true number of latent populations under virtually all conditions. Increasing the size of the resampled data sets relative to the size of the primary data set further increased accuracy. A computer program to implement the bootstrap stopping rule is made available via a referenced Web site.

MeSH terms

  • Cluster Analysis*
  • Humans
  • Psychology / methods*
  • Sampling Studies