Using Deep Learning to Identify Linguistic Features that Facilitate or Inhibit the Propagation of Anti- and Pro-Vaccine Content on Social Media

Young Anna Argyris; Nan Zhang; Bidhan Bashyal; Pang-Ning Tan

doi:10.1109/icdh55609.2022.00025

Using Deep Learning to Identify Linguistic Features that Facilitate or Inhibit the Propagation of Anti- and Pro-Vaccine Content on Social Media

2022 IEEE Int Conf Digit Health IEEE IDCH 2022 (2022). 2022 Jul:2022:107-116. doi: 10.1109/icdh55609.2022.00025. Epub 2022 Aug 24.

Authors

Young Anna Argyris¹, Nan Zhang², Bidhan Bashyal³, Pang-Ning Tan³

Affiliations

¹ Dept of Media and Information, Michigan State University, East Lansing, MI.
² Dept of Advertising and Public Relations, Michigan State University, East Lansing, MI.
³ Dept of Computer Science and Engineering, Michigan State University, East Lansing, MI.

Abstract

Anti-vaccine content is rapidly propagated via social media, fostering vaccine hesitancy, while pro-vaccine content has not replicated the opponent's successes. Despite this disparity in the dissemination of anti- and pro-vaccine posts, linguistic features that facilitate or inhibit the propagation of vaccine-related content remain less known. Moreover, most prior machine-learning algorithms classified social-media posts into binary categories (e.g., misinformation or not) and have rarely tackled a higher-order classification task based on divergent perspectives about vaccines (e.g., anti-vaccine, pro-vaccine, and neutral). Our objectives are (1) to identify sets of linguistic features that facilitate and inhibit the propagation of vaccine-related content and (2) to compare whether anti-vaccine, provaccine, and neutral tweets contain either set more frequently than the others. To achieve these goals, we collected a large set of social media posts (over 120 million tweets) between Nov. 15 and Dec. 15, 2021, coinciding with the Omicron variant surge. A two-stage framework was developed using a fine-tuned BERT classifier, demonstrating over 99 and 80 percent accuracy for binary and ternary classification. Finally, the Linguistic Inquiry Word Count text analysis tool was used to count linguistic features in each classified tweet. Our regression results show that anti-vaccine tweets are propagated (i.e., retweeted), while pro-vaccine tweets garner passive endorsements (i.e., favorited). Our results also yielded the two sets of linguistic features as facilitators and inhibitors of the propagation of vaccine-related tweets. Finally, our regression results show that anti-vaccine tweets tend to use the facilitators, while pro-vaccine counterparts employ the inhibitors. These findings and algorithms from this study will aid public health officials' efforts to counteract vaccine misinformation, thereby facilitating the delivery of preventive measures during pandemics and epidemics.

Keywords: deep-learning; diffusion of information; health informatics; regression analyses; social media; vaccine misinformation.

Associated data

Dryad/10.5061/dryad.d51c5b05j

Grants and funding

R21 LM013638/LM/NLM NIH HHS/United States