Outlier Detection Using Structural Scores in a High-Dimensional Space

IEEE Trans Cybern. 2020 May;50(5):2302-2310. doi: 10.1109/TCYB.2018.2876615. Epub 2018 Nov 7.

Abstract

Outlier detection has drawn significant interest from both academia and industry, such as network intrusion detection. Most existing methods implicitly or explicitly rely on distances in Euclidean space. However, the Euclidean distance may be incapable of measuring the similarity among high-dimensional data due to the curse of dimensionality, thus leading to inferior performance in practice. This paper presents an innovative approach for outlier detection from the view of meaningful structure scores. If two points have similar features, the difference between their structural scores is small and vice versa. The scores are calculated by measuring the variance of angles weighted by data representation, which takes the global data structure into the measurement. Thus, it could consistently rank more similar points. Compared with existing methods, our structural scores could be better to reflect the characteristics of data in a high-dimensional space. The proposed method consistently ranks more similar points. Experiments on synthetic and several real-world datasets have demonstrated the effectiveness and efficiency of our proposed methods.