Predicting pulmonary tuberculosis incidence in China using Baidu search index: an ARIMAX model approach

Environ Health Prev Med. 2023:28:68. doi: 10.1265/ehpm.23-00141.

Abstract

Background: Existing researches have established a correlation between internet search data and the epidemics of numerous infectious diseases. This study aims to develop a prediction model to explore the relationship between the Pulmonary Tuberculosis (PTB) epidemic trend in China and the Baidu search index.

Methods: Collect the number of new cases of PTB in China from January 2011 to August 2022. Use Spearman rank correlation and interaction analysis to identify Baidu keywords related to PTB and construct a PTB comprehensive search index. Evaluate the predictive performance of autoregressive integrated moving average (ARIMA) and ARIMA with explanatory variable (ARIMAX) models for the number of PTB cases.

Results: Incidence of PTB had shown a fluctuating downward trend. The Spearman rank correlation coefficient between the PTB comprehensive search index and its incidence was 0.834 (P < 0.001). The ARIMA model had an AIC value of 2804.41, and the MAPE value was 13.19%. The ARIMAX model incorporating the Baidu index demonstrated an AIC value of 2761.58 and a MAPE value of 5.33%.

Conclusions: The ARIMAX model is superior to ARIMA in terms of fitting and predicting accuracy. Additionally, the use of Baidu Index has proven to be effective in predicting cases of PTB.

Keywords: ARIMA model; Internet search index; PTB; Predictive model.

MeSH terms

  • China / epidemiology
  • Humans
  • Incidence
  • Models, Statistical*
  • Tuberculosis, Pulmonary* / epidemiology