A web framework for information aggregation and management of multilingual hate speech

Heliyon. 2023 May 9;9(5):e16084. doi: 10.1016/j.heliyon.2023.e16084. eCollection 2023 May.

Abstract

Social media platforms have led to the creation of a vast amount of information produced by users and published publicly, facilitating participation in the public sphere, but also giving the opportunity for certain users to publish hateful content. This content mainly involves offensive/discriminative speech towards social groups or individuals (based on racial, religious, gender or other characteristics) and could possibly lead into subsequent hate actions/crimes due to persistent escalation. Content management and moderation in big data volumes can no longer be supported manually. In the current research, a web framework is presented and evaluated for the collection, analysis, and aggregation of multilingual textual content from various online sources. The framework is designed to address the needs of human users, journalists, academics, and the public to collect and analyze content from social media and the web in Spanish, Italian, Greek, and English, without prior training or a background in Computer Science. The backend functionality provides content collection and monitoring, semantic analysis including hate speech detection and sentiment analysis using machine learning models and rule-based algorithms, storing, querying, and retrieving such content along with the relevant metadata in a database. This functionality is assessed through a graphic user interface that is accessed using a web browser. An evaluation procedure was held through online questionnaires, including journalists and students, proving the feasibility of the use of the proposed framework by non-experts for the defined use-case scenarios.

Keywords: Content management and moderation; Hate speech; Information retrieval; Machine learning; Multilingual; Natural language processing; Sentiment analysis; Social media.