A hybrid anomaly detection method for high dimensional data

PeerJ Comput Sci. 2023 Jan 12:9:e1199. doi: 10.7717/peerj-cs.1199. eCollection 2023.

Abstract

Anomaly detection of high-dimensional data is a challenge because the sparsity of the data distribution caused by high dimensionality hardly provides rich information distinguishing anomalous instances from normal instances. To address this, this article proposes an anomaly detection method combining an autoencoder and a sparse weighted least squares-support vector machine. First, the autoencoder is used to extract those low-dimensional features of high-dimensional data, thus reducing the dimension and the complexity of the searching space. Then, in the low-dimensional feature space obtained by the autoencoder, the sparse weighted least squares-support vector machine separates anomalous and normal features. Finally, the learned class labels to be used to distinguish normal instances and abnormal instances are outputed, thus achieving anomaly detection of high-dimensional data. The experiment results on real high-dimensional datasets show that the proposed method wins over competing methods in terms of anomaly detection ability. For high-dimensional data, using deep methods can reconstruct the layered feature space, which is beneficial for gaining those advanced anomaly detection results.

Keywords: Anomaly detection; Autoencoder; High-dimensional data; Support vector machine.

Grants and funding

The authors received no funding for this work.