Visual Structural Assessment and Anomaly Detection for High-Velocity Data Streams

IEEE Trans Cybern. 2021 Dec;51(12):5979-5992. doi: 10.1109/TCYB.2020.2973137. Epub 2021 Dec 22.

Abstract

The widespread use of Internet-of-Things (IoT) technologies, smartphones, and social media services generates huge amounts of data streaming at high velocity. Automatic interpretation of these rapidly arriving data streams is required for the timely detection of interesting events that usually emerge in the form of clusters. This article proposes a new relative of the visual assessment of the cluster tendency (VAT) model, which produces a record of structural evolution in the data stream by building a cluster heat map of the entire processing history in the stream. The existing VAT-based algorithms for streaming data, called inc-VAT/inc-iVAT and dec-VAT/dec-iVAT, are not suitable for high-velocity and high-volume streaming data because of high memory requirements and slower processing speed as the accumulated data increases. The scalable iVAT (siVAT) algorithm can handle big batch data, but for streaming data, it needs to be (re)applied everytime a new datapoint arrives, which is not feasible due to the associated computation complexities. To address this problem, we propose an incremental siVAT algorithm, called inc-siVAT, which deals with the streaming data in chunks. It first extracts a small size smart sample using an intelligent sampling scheme, called maximin random sampling (MMRS), then incrementally updates the smart sample points on the fly, using our novel incremental MMRS (inc-MMRS) algorithm, to reflect changes in the data stream after each chunk is processed, and finally, produces an incrementally built iVAT image of the updated smart sample, using the inc-VAT/inc-iVAT and dec-VAT/dec-iVAT algorithms. These images can be used to visualize the evolving cluster structure and for anomaly detection in streaming data. Our method is illustrated with one synthetic and four real datasets, two of which evolve significantly over time. Our numerical experiments demonstrate the algorithm's ability to successfully identify anomalies and visualize changing cluster structure in streaming data.

MeSH terms

  • Algorithms*
  • Humans