[Study on the ARIMA model application to predict echinococcosis cases in China]

Zhongguo Xue Xi Chong Bing Fang Zhi Za Zhi. 2018 Feb 26;30(1):47-53. doi: 10.16250/j.32.1374.2017173.
[Article in Chinese]

Abstract

Objective: To predict the monthly reported echinococcosis cases in China with the autoregressive integrated moving average (ARIMA) model, so as to provide a reference for prevention and control of echinococcosis.

Methods: SPSS 24.0 software was used to construct the ARIMA models based on the monthly reported echinococcosis cases of time series from 2007 to 2015 and 2007 to 2014, respectively, and the accuracies of the two ARIMA models were compared.

Results: The model based on the data of the monthly reported cases of echinococcosis in China from 2007 to 2015 was ARIMA (1, 0, 0) (1, 1, 0)12, the relative error among reported cases and predicted cases was -13.97%, AR (1) = 0.367 (t = 3.816, P < 0.001), SAR (1) = -0.328 (t = -3.361, P = 0.001), and Ljung-Box Q = 14.119 (df = 16, P = 0.590) . The model based on the data of the monthly reported cases of echinococcosis in China from 2007 to 2014 was ARIMA (1, 0, 0) (1, 0, 1)12, the relative error among reported cases and predicted cases was 0.56%, AR (1) = 0.413 (t = 4.244, P < 0.001), SAR (1) = 0.809 (t = 9.584, P < 0.001), SMA (1) = 0.356 (t = 2.278, P = 0.025), and Ljung-Box Q = 18.924 (df = 15, P = 0.217).

Conclusions: The different time series may have different ARIMA models as for the same infectious diseases. It is needed to be further verified that the more data are accumulated, the shorter time of predication is, and the smaller the average of the relative error is. The establishment and prediction of an ARIMA model is a dynamic process that needs to be adjusted and optimized continuously according to the accumulated data, meantime, we should give full consideration to the intensity of the work related to infectious diseases reported (such as disease census and special investigation).

[摘要]目的 采用自回归移动平均模型 (Autoregressive integrated moving average, ARIMA) 对全国 (不含港、澳、台地区) 包虫病月报告病例数进行预测, 为包虫病的防控提供科学参考。 方法 通过SPSS 24.0软件, 分别以2007-2015年和 2007-2014年全国包虫病月报告病例数, 分别建立最优的ARIMA模型, 并进行模型比较。 结果 2007-2015年全国包虫 病月报告病例数的最优模型为ARIMA (1, 0, 0) (1, 1, 0)12, 预测相对误差为-13.97%, AR (1) = 0.367 (t = 3.816, P < 0.001) 、SAR (1) = -0.328 (t =-3.361, P = 0.001), Ljung-Box Q = 14.119 (df = 16, P = 0.590) 。2007-2014年全国包虫病月报告病例 数的最优模型为ARIMA (1, 0, 0) (1, 0, 1)12, 预测相对误差为0.56%, AR (1) = 0.413 (t = 4.244, P < 0.001), SAR (1) = 0.809 t = 9.584, P < 0.001), SMA (1) = 0.356 (t = 2.278, P = 0.025), Ljung-Box Q = 18.924 (df = 15, P = 0.217) 。 结论 时间序列 不同, 所建立的预测模型可能不同。数据积累越多、预测时间越短、预测误差越小的情况还需得到进一步验证。模型的 建立和预测应用是动态过程, 需要不断根据积累的数据进行调整, 但同时要充分考虑影响传染病报告病例数相关工作 (普查和专项调查等) 的影响。.

Keywords: Autoregressive integrated moving average (ARIMA) model; Echinococcosis; Modeling; Monthly reported cases.

MeSH terms

  • China
  • Echinococcosis / diagnosis*
  • Forecasting*
  • Humans
  • Incidence
  • Models, Statistical*