A systematic literature review on spam content detection and classification

Sanaa Kaddoura; Ganesh Chandrasekaran; Daniela Elena Popescu; Jude Hemanth Duraisamy

doi:10.7717/peerj-cs.830

A systematic literature review on spam content detection and classification

PeerJ Comput Sci. 2022 Jan 20:8:e830. doi: 10.7717/peerj-cs.830. eCollection 2022.

Authors

Sanaa Kaddoura¹, Ganesh Chandrasekaran², Daniela Elena Popescu³, Jude Hemanth Duraisamy⁴

Affiliations

¹ Zayed University, Abu Dhabi, United Arab Emirates.
² Electronics and Communication Engineering, Sri Eshwar College of Engineering, Coimbatore, Tamil Nadu, India.
³ Faculty of Electrical Engineering and Information Technology, University of Oradea, Oradea, Romania.
⁴ Electronics and Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India.

Abstract

The presence of spam content in social media is tremendously increasing, and therefore the detection of spam has become vital. The spam contents increase as people extensively use social media, i.e., Facebook, Twitter, YouTube, and E-mail. The time spent by people using social media is overgrowing, especially in the time of the pandemic. Users get a lot of text messages through social media, and they cannot recognize the spam content in these messages. Spam messages contain malicious links, apps, fake accounts, fake news, reviews, rumors, etc. To improve social media security, the detection and control of spam text are essential. This paper presents a detailed survey on the latest developments in spam text detection and classification in social media. The various techniques involved in spam detection and classification involving Machine Learning, Deep Learning, and text-based approaches are discussed in this paper. We also present the challenges encountered in the identification of spam with its control mechanisms and datasets used in existing works involving spam detection.

Keywords: Classification; Data mining; Deep learning; Machine learning; Natural language processing; Social media analysis; Spam Content; Text mining.

Grants and funding

This work was funded by Zayed University–Start-up research grant (Grant number R20081). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.