Nonlinear dimensionality reduction of data lying on the multicluster manifold

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):1111-22. doi: 10.1109/TSMCB.2008.925663.

Abstract

A new method, which is called decomposition-composition (D-C) method, is proposed for the nonlinear dimensionality reduction (NLDR) of data lying on the multicluster manifold. The main idea is first to decompose a given data set into clusters and independently calculate the low-dimensional embeddings of each cluster by the decomposition procedure. Based on the intercluster connections, the embeddings of all clusters are then composed into their proper positions and orientations by the composition procedure. Different from other NLDR methods for multicluster data, which consider associatively the intracluster and intercluster information, the D-C method capitalizes on the separate employment of the intracluster neighborhood structures and the intercluster topologies for effective dimensionality reduction. This, on one hand, isometrically preserves the rigid-body shapes of the clusters in the embedding process and, on the other hand, guarantees the proper locations and orientations of all clusters. The theoretical arguments are supported by a series of experiments performed on the synthetic and real-life data sets. In addition, the computational complexity of the proposed method is analyzed, and its efficiency is theoretically analyzed and experimentally demonstrated. Related strategies for automatic parameter selection are also examined.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Cluster Analysis*
  • Computer Simulation
  • Models, Theoretical*
  • Nonlinear Dynamics*
  • Pattern Recognition, Automated / methods*