The Prediction of Hepatitis E through Ensemble Learning

Int J Environ Res Public Health. 2020 Dec 28;18(1):159. doi: 10.3390/ijerph18010159.

Abstract

According to the World Health Organization, about 20 million people are infected with Hepatitis E every year. In 2015, there were 44,000 deaths due to HEV infection worldwide. Food, water and climate are key factors that affect the outbreak of Hepatitis E. This paper presents an ensemble learning model for Hepatitis E prediction by studying the correlation between historical epidemic cases of hepatitis E and environmental factors (water quality and meteorological data). Environmental factors include many features, and ones that are most relevant to HEV are selected and input into the ensemble learning model composed by Gradient Boosting Decision Tree (GBDT) and Random Forest for training and prediction. Three indicators, root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE), are used to evaluate the effectiveness of the ensemble learning model against the classical time series prediction model. It is concluded that the ensemble learning model has a better prediction effect than the classical model, and the prediction effectiveness can be improved by exploiting water quality and meteorological factors (radiation, air pressure, precipitation).

Keywords: ensemble learning; hepatitis E; prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Climate*
  • Disease Outbreaks
  • Hepatitis E* / epidemiology
  • Humans
  • Machine Learning*
  • Models, Theoretical
  • Water Quality*