A Heterogeneous Ensemble Forecasting Model for Disease Prediction

New Gener Comput. 2021;39(3-4):701-715. doi: 10.1007/s00354-020-00119-7. Epub 2021 Jan 4.

Abstract

The manuscript presents a bragging-based ensemble forecasting model for predicting the number of incidences of a disease based on past occurrences. The objectives of this research work are to enhance accuracy, reduce overfitting, and handle overdrift; the proposed model has shown promising results in terms of error metrics. The collated dataset of the diseases is collected from the official government site of Hong Kong from the year 2010 to 2019. The preprocessing is done using log transformation and z score transformation. The proposed ensemble model is applied, and its applicability to a specific disease dataset is presented. The proposed ensemble model is compared against the ensemble models, namely dynamic ensemble for time series, arbitrated dynamic ensemble, and random forest using different error metrics. The proposed model shows the reduced value of MAE (mean average error) by 27.18%, 3.07%, 11.58%, 13.46% for tuberculosis, dengue, food poisoning, and chickenpox, respectively. The comparison drawn between the proposed model and the existing models shows that the proposed ensemble model gives better accuracy in the case of all the four-disease datasets.

Keywords: Bootstrapping; Bragging; Disease forecasting; Ensemble; Time series forecasting.