Online Sparse Representation Clustering for Evolving Data Streams

IEEE Trans Neural Netw Learn Syst. 2023 Oct 27:PP. doi: 10.1109/TNNLS.2023.3325556. Online ahead of print.

Abstract

Data stream clustering can be performed to discover the patterns underlying continuously arriving sequences of data. A number of data stream clustering algorithms for finding clusters in arbitrary shapes and handling outliers, such as density-based clustering algorithms, have been proposed. However, these algorithms are often limited in their ability to construct and merge microclusters by measuring the Euclidean distances between high-dimensional data objects, e.g., transferring valuable knowledge from historical landmark windows to the current landmark window, and exploiting evolving subspace structures adaptively. We propose an online sparse representation clustering (OSRC) method to learn an affinity matrix for evaluating the relationships among high-dimensional data objects in evolving data streams. We first introduce a low-dimensional projection (LDP) into sparse representation to adaptively reduce the potential negative influence associated with the noise and redundancy contained in high-dimensional data. Then, we take advantage of the l2,1 -norm optimization technique to choose the appropriate number of representative data objects and form a specific dictionary for sparse representation. The specific dictionary is integrated into sparse representation to adaptively exploit the evolving subspace structures of the high-dimensional data objects. Moreover, the data object representatives from the current landmark window can transfer valuable knowledge to the next landmark window. The experimental results based on a synthetic dataset and six benchmark datasets validate the effectiveness of the proposed method compared to that of state-of-the-art methods for data stream clustering.