A Performance Comparison of Unsupervised Techniques for Event Detection from Oscar Tweets

Muzamil Malik; Waqar Aslam; Zahid Aslam; Abdullah Alharbi; Bader Alouffi; Hafiz Tayyab Rauf

doi:10.1155/2022/5980043

A Performance Comparison of Unsupervised Techniques for Event Detection from Oscar Tweets

Comput Intell Neurosci. 2022 May 24:2022:5980043. doi: 10.1155/2022/5980043. eCollection 2022.

Authors

Muzamil Malik¹, Waqar Aslam¹, Zahid Aslam¹, Abdullah Alharbi², Bader Alouffi³, Hafiz Tayyab Rauf⁴

Affiliations

¹ Department of Computer Science & Information Technology, Islamia University of Bahawalpur, Bahawalpur, Pakistan.
² Department of Information Technology, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia.
³ Department of Computer Science, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia.
⁴ Centre for Smart Systems, AI and Cybersecurity, Staffordshire University, Stoke-on-Trent, UK.

Abstract

People's lives are influenced by social media. It is an essential source for sharing news, awareness, detecting events, people's interests, etc. Social media covers a wide range of topics and events to be discussed. Extensive work has been published to capture the interesting events and insights from datasets. Many techniques are presented to detect events from social media networks like Twitter. In text mining, most of the work is done on a specific dataset, and there is the need to present some new datasets to analyse the performance and generic nature of Topic Detection and Tracking methods. Therefore, this paper publishes a dataset of real-life event, the Oscars 2018, gathered from Twitter and makes a comparison of soft frequent pattern mining (SFPM), singular value decomposition and k-means (K-SVD), feature-pivot (Feat-p), document-pivot (Doc-p), and latent Dirichlet allocation (LDA). The dataset contains 2,160,738 tweets collected using some seed words. Only English tweets are considered. All of the methods applied in this paper are unsupervised. This area needs to be explored on different datasets. The Oscars 2018 is evaluated using keyword precision (K-Prec), keyword recall (K-Rec), and topic recall (T-Rec) for detecting events of greater interest. The highest K-Prec, K-Rec, and T-Rec were achieved by SFPM, but they started to decrease as the number of clusters increased. The lowest performance was achieved by Feat-p in terms of all three metrics. Experiments on the Oscars 2018 dataset demonstrated that all the methods are generic in nature and produce meaningful clusters.

MeSH terms

Data Mining
Humans
Social Media*
Social Networking