Automatic detection of cyberbullying in social media text

Cynthia Van Hee; Gilles Jacobs; Chris Emmery; Bart Desmet; Els Lefever; Ben Verhoeven; Guy De Pauw; Walter Daelemans; Véronique Hoste

doi:10.1371/journal.pone.0203794

Automatic detection of cyberbullying in social media text

PLoS One. 2018 Oct 8;13(10):e0203794. doi: 10.1371/journal.pone.0203794. eCollection 2018.

Authors

Cynthia Van Hee¹, Gilles Jacobs¹, Chris Emmery², Bart Desmet¹, Els Lefever¹, Ben Verhoeven², Guy De Pauw², Walter Daelemans², Véronique Hoste¹

Affiliations

¹ Department of Translation, Interpreting and Communication - Faculty of Arts and Philosophy, Ghent University, Ghent, Belgium.
² Department of Linguistics - Faculty of Arts, University of Antwerp, Antwerp, Belgium.

Abstract

While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Crime Victims / psychology
Cyberbullying / psychology*
Humans
Internet*
Language
Semantics*
Social Media*
Support Vector Machine

Grants and funding

The work presented in this paper was carried out in the framework of the AMiCA IWT SBO-project 120007 project to WD and VH, funded by the government Flanders Innovation & Entrepreneurship (VLAIO) agency; http://www.vlaio.be. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.