A fast parallelized DBSCAN algorithm based on OpenMp for detection of criminals on streaming services

Front Big Data. 2023 Oct 31:6:1292923. doi: 10.3389/fdata.2023.1292923. eCollection 2023.

Abstract

Introduction: Streaming services are highly popular today. Millions of people watch live streams or videos and listen to music.

Methods: One of the most popular streaming platforms is Twitch, and data from this type of service can be a good example for applying the parallel DBSCAN algorithm proposed in this paper. Unlike the classical approach to neighbor search, the proposed one avoids redundancy, i.e., the repetition of the same calculations. At the same time, this algorithm is based on the classical DBSCAN method with a full search for all neighbors, parallelization by subtasks, and OpenMP parallel computing technology.

Results: In this work, without reducing the accuracy, we managed to speed up the solution based on the DBSCAN algorithm when analyzing medium-sized data. As a result, the acceleration rate tends to the number of cores of a multicore computer system and the efficiency to one.

Discussion: Before conducting numerical experiments, theoretical estimates of speed-up and efficiency were obtained, and they aligned with the results obtained, confirming their validity. The quality of the performed clustering was verified using the silhouette value. All experiments were conducted using different percentages of medium-sized datasets. The prospects of applying the proposed algorithm can be obtained in various fields such as advertising, marketing, cybersecurity, and sociology. It is worth mentioning that datasets of this kind are often used for detecting fraud on the Internet, making an algorithm capable of considering all neighbors a useful tool for such research.

Keywords: OpenMP technology; clusterization; efficiency; recommender systems; silhouette value; speed-up.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. The work was supported by the scientific direction Analysis of big data of the National University Lviv Polytechnic of the Department of Artificial Intelligence Systems.