IFND: a benchmark dataset for fake news detection

Dilip Kumar Sharma; Sonal Garg

doi:10.1007/s40747-021-00552-1

IFND: a benchmark dataset for fake news detection

Complex Intell Systems. 2023;9(3):2843-2863. doi: 10.1007/s40747-021-00552-1. Epub 2021 Oct 16.

Authors

Dilip Kumar Sharma¹, Sonal Garg¹

Affiliation

¹ GLA University, Mathura, India.

Abstract

Spotting fake news is a critical problem nowadays. Social media are responsible for propagating fake news. Fake news propagated over digital platforms generates confusion as well as induce biased perspectives in people. Detection of misinformation over the digital platform is essential to mitigate its adverse impact. Many approaches have been implemented in recent years. Despite the productive work, fake news identification poses many challenges due to the lack of a comprehensive publicly available benchmark dataset. There is no large-scale dataset that consists of Indian news only. So, this paper presents IFND (Indian fake news dataset) dataset. The dataset consists of both text and images. The majority of the content in the dataset is about events from the year 2013 to the year 2021. Dataset content is scrapped using the Parsehub tool. To increase the size of the fake news in the dataset, an intelligent augmentation algorithm is used. An intelligent augmentation algorithm generates meaningful fake news statements. The latent Dirichlet allocation (LDA) technique is employed for topic modelling to assign the categories to news statements. Various machine learning and deep-learning classifiers are implemented on text and image modality to observe the proposed IFND dataset's performance. A multi-modal approach is also proposed, which considers both textual and visual features for fake news detection. The proposed IFND dataset achieved satisfactory results. This study affirms that the accessibility of such a huge dataset can actuate research in this laborious exploration issue and lead to better prediction models.

Keywords: Deep-learning; Fake news detection; Indian dataset; LDA topic modelling; Machine learning.

Publication types

News