Multivariate functional data clustering using adaptive density peak detection

Stat Med. 2023 May 10;42(10):1565-1582. doi: 10.1002/sim.9687. Epub 2023 Feb 24.

Abstract

Clustering for multivariate functional data is a challenging problem since the data are represented by a set of curves and functions belonging to an infinite-dimensional space. In this article, we propose a novel clustering method for multivariate functional data using an adaptive density peak detection technique. It is a quick cluster center identification algorithm based on the two measures of each functional data observation: the functional density estimate and the distance to the closest observation with a higher functional density. We suggest two types of functional density estimators for multivariate functional data. The first one is a functional k $$ k $$ -nearest neighbor density estimator based on (a) an L2 distance between raw functional curves, or (b) a semimetric of multivariate functional principal components. The second one is a k $$ k $$ -nearest neighbor density estimator based on multivariate functional principal scores. Our clustering method is computationally fast since it does not need an iterative process. The flexibility and advantages of the method are examined by comparing it with other existing clustering methods in simulation studies. A user-friendly R package FADPclust is developed for public use. Finally, our method is applied to a real case study in lung cancer research.

Keywords: k $$ k $$ -nearest neighbor density estimation; clustering; density peak detection; multivariate functional data.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Computer Simulation
  • Humans