Predicting Smoking Prevalence in Japan Using Search Volumes in an Internet Search Engine: Infodemiology Study

J Med Internet Res. 2022 Dec 14;24(12):e42619. doi: 10.2196/42619.

Abstract

Background: Tobacco smoking is an important public health issue and a core indicator of public health policy worldwide. However, global pandemics and natural disasters have prevented surveys from being conducted.

Objective: The purpose of this study was to predict smoking prevalence by prefecture and sex in Japan using Internet search trends.

Methods: This study used the infodemiology approach. The outcome variable was smoking prevalence by prefecture, obtained from national surveys. The predictor variables were the search volumes on Yahoo! Japan Search. We collected the search volumes for queries related to terms from the thesaurus of the Japanese medical article database Ichu-shi. Predictor variables were converted to per capita values and standardized as z scores. For smoking prevalence, the values for 2016 and 2019 were used, and for search volume, the values for the April 1 to March 31 fiscal year (FY) 1 year prior to the survey (ie, FY 2015 and FY 2018) were used. Partial correlation coefficients, adjusted for data year, were calculated between smoking prevalence and search volume, and a regression analysis using a generalized linear mixed model with random effects was conducted for each prefecture. Several models were tested, including a model that included all search queries, a variable reduction method, and one that excluded cigarette product names. The best model was selected with the Akaike information criterion corrected (AICC) for small sample size and the Bayesian information criterion (BIC). We compared the predicted and actual smoking prevalence in 2016 and 2019 based on the best model and predicted the smoking prevalence in 2022.

Results: The partial correlation coefficients for men showed that 9 search queries had significant correlations with smoking prevalence, including cigarette (r=-0.417, P<.001), cigar in kanji (r=-0.412, P<.001), and cigar in katakana (r=-0.399, P<.001). For women, five search queries had significant correlations, including vape (r=0.335, P=.001), quitting smoking (r=0.288, P=.005), and cigar (r=0.286, P=.006). The models with all search queries were the best models for both AICC and BIC scores. Scatter plots of actual and estimated smoking prevalence in 2016 and 2019 confirmed a relatively high degree of agreement. The average estimated smoking prevalence in 2022 in the 47 prefectures for the total sample was 23.492% (95% CI 21.617%-25.367%), showing an increasing trend, with an average of 29.024% (95% CI 27.218%-30.830%) for men and 8.793% (95% CI 7.531%-10.054%) for women.

Conclusions: This study suggests that the search volume of tobacco-related queries in internet search engines can predict smoking prevalence by prefecture and sex in Japan. These findings will enable the development of low-cost, timely, and crisis-resistant health indicators that will enable the evaluation of health measures and contribute to improved public health.

Keywords: health indicator; health policy; health promotion; infodemiology; internet use; public health; quality indicators; search engine; smoking; smoking trend; tobacco use.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Female
  • Humans
  • Infodemiology*
  • Internet
  • Japan / epidemiology
  • Male
  • Prevalence
  • Search Engine*
  • Smoking / epidemiology
  • Tobacco Smoking