Deep learning models for hepatitis E incidence prediction leveraging meteorological factors

PLoS One. 2023 Mar 13;18(3):e0282928. doi: 10.1371/journal.pone.0282928. eCollection 2023.

Abstract

Background: Infectious diseases are a major threat to public health, causing serious medical consumption and casualties. Accurate prediction of infectious diseases incidence is of great significance for public health organizations to prevent the spread of diseases. However, only using historical incidence data for prediction can not get good results. This study analyzes the influence of meteorological factors on the incidence of hepatitis E, which are used to improve the accuracy of incidence prediction.

Methods: We extracted the monthly meteorological data, incidence and cases number of hepatitis E from January 2005 to December 2017 in Shandong province, China. We employ GRA method to analyze the correlation between the incidence and meteorological factors. With these meteorological factors, we achieve a variety of methods for incidence of hepatitis E by LSTM and attention-based LSTM. We selected data from July 2015 to December 2017 to validate the models, and the rest was taken as training set. Three metrics were applied to compare the performance of models, including root mean square error(RMSE), mean absolute percentage error(MAPE) and mean absolute error(MAE).

Results: Duration of sunshine and rainfall-related factors(total rainfall, maximum daily rainfall) are more relevant to the incidence of hepatitis E than other factors. Without meteorological factors, we obtained 20.74%, 19.50% for incidence in term of MAPE, by LSTM and A-LSTM, respectively. With meteorological factors, we obtained 14.74%, 12.91%, 13.21%, 16.83% for incidence, in term of MAPE, by LSTM-All, MA-LSTM-All, TA-LSTM-All, BiA-LSTM-All, respectively. The prediction accuracy increased by 7.83%. Without meteorological factors, we achieved 20.41%, 19.39% for cases in term of MAPE, by LSTM and A-LSTM, respectively. With meteorological factors, we achieved 14.20%, 12.49%, 12.72%, 15.73% for cases, in term of MAPE, by LSTM-All, MA-LSTM-All, TA-LSTM-All, BiA-LSTM-All, respectively. The prediction accuracy increased by 7.92%. More detailed results are shown in results section of this paper.

Conclusions: The experiments show that attention-based LSTM is superior to other comparative models. Multivariate attention and temporal attention can greatly improve the prediction performance of the models. Among them, when all meteorological factors are used, multivariate attention performance is better. This study can provide reference for the prediction of other infectious diseases.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • China / epidemiology
  • Deep Learning*
  • Hepatitis E*
  • Humans
  • Incidence
  • Meteorological Concepts

Grants and funding

This work was supported by Shandong Medical Health Science and Technology Development Programs (No. 2018WS309) to YF, Taishan Scholar Program of Shandong Province(No. tstp20221164) to LZ, ZhiFei Disease Prevention and Control Technology Research Fund Project (No. LYH2017-08) to YF, Science and Technology Project for the Universities of Shandong Province (No. J18KB171) to YG, and Shandong Women’s University High level scientific research project Cultivation Fund (No. 2020GSPGJ08) to YG. YF: Conceptualization; Data curation; Methodology; Writing original draft. LZ: Data curation; Writing review and editing. YG: Conceptualization; Methodology; Writing review and editin.