Classification of unlabeled online media

Sakthi Kumar Arul Prakash; Conrad Tucker

doi:10.1038/s41598-021-85608-5

Classification of unlabeled online media

Sci Rep. 2021 Mar 25;11(1):6908. doi: 10.1038/s41598-021-85608-5.

Authors

Sakthi Kumar Arul Prakash¹, Conrad Tucker^{2

3

4

5

6}

Affiliations

¹ Department of Mechanical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213-3890, USA.
² Department of Mechanical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213-3890, USA. conradt@andrew.cmu.edu.
³ Department of Machine Learning, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213-3890, USA. conradt@andrew.cmu.edu.
⁴ The Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213-3890, USA. conradt@andrew.cmu.edu.
⁵ Department of Biomedical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213-3890, USA. conradt@andrew.cmu.edu.
⁶ CyLab Security and Privacy Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213-3890, USA. conradt@andrew.cmu.edu.

Abstract

This work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need for ground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, this work leverages user-user and user-media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) being spread, without needing to know the actual details of the information itself. To study the inception and evolution of user-user and user-media interactions over time, we create an experimental platform that mimics the functionality of real-world social media networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty (entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world social media network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, and with media content. The discovery that the entropy of user-user and user-media interactions approximate fake and authentic media likes, enables us to classify fake media in an unsupervised learning manner.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.