Social Media Data Mining of Antitobacco Campaign Messages: Machine Learning Analysis of Facebook Posts

J Med Internet Res. 2023 Feb 13:25:e42863. doi: 10.2196/42863.

Abstract

Background: Social media platforms provide a valuable source of public health information, as one-third of US adults seek specific health information online. Many antitobacco campaigns have recognized such trends among youth and have shifted their advertising time and effort toward digital platforms. Timely evidence is needed to inform the adaptation of antitobacco campaigns to changing social media platforms.

Objective: In this study, we conducted a content analysis of major antitobacco campaigns on Facebook using machine learning and natural language processing (NLP) methods, as well as a traditional approach, to investigate the factors that may influence effective antismoking information dissemination and user engagement.

Methods: We collected 3515 posts and 28,125 associated comments from 7 large national and local antitobacco campaigns on Facebook between 2018 and 2021, including the Real Cost, Truth, CDC Tobacco Free (formally known as Tips from Former Smokers, where "CDC" refers to the Centers for Disease Control and Prevention), the Tobacco Prevention Toolkit, Behind the Haze VA, the Campaign for Tobacco-Free Kids, and Smoke Free US campaigns. NLP methods were used for content analysis, including parsimonious rule-based models for sentiment analysis and topic modeling. Logistic regression models were fitted to examine the relationship of antismoking message-framing strategies and viewer responses and engagement.

Results: We found that large campaigns from government and nonprofit organizations had more user engagements compared to local and smaller campaigns. Facebook users were more likely to engage in negatively framed campaign posts. Negative posts tended to receive more negative comments (odds ratio [OR] 1.40, 95% CI 1.20-1.65). Positively framed posts generated more negative comments (OR 1.41, 95% CI 1.19-1.66) as well as positive comments (OR 1.29, 95% CI 1.13-1.48). Our content analysis and topic modeling uncovered that the most popular campaign posts tended to be informational (ie, providing new information), where the key phrases included talking about harmful chemicals (n=43, 43%) as well as the risk to pets (n=17, 17%).

Conclusions: Facebook users tend to engage more in antitobacco educational campaigns that are framed negatively. The most popular campaign posts are those providing new information, with key phrases and topics discussing harmful chemicals and risks of secondhand smoke for pets. Educational campaign designers can use such insights to increase the reach of antismoking campaigns and promote behavioral changes.

Keywords: Facebook; content analysis; engagement; natural language processing; public health; smoking; social media; social media campaign; tobacco; tobacco control; topic modeling; use; youth.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Advertising
  • Data Mining
  • Humans
  • Information Dissemination
  • Public Health
  • Social Media*