Predicting malaria outbreak in The Gambia using machine learning techniques

PLoS One. 2024 May 16;19(5):e0299386. doi: 10.1371/journal.pone.0299386. eCollection 2024.

Abstract

Malaria is the most common cause of death among the parasitic diseases. Malaria continues to pose a growing threat to the public health and economic growth of nations in the tropical and subtropical parts of the world. This study aims to address this challenge by developing a predictive model for malaria outbreaks in each district of The Gambia, leveraging historical meteorological data. To achieve this objective, we employ and compare the performance of eight machine learning algorithms, including C5.0 decision trees, artificial neural networks, k-nearest neighbors, support vector machines with linear and radial kernels, logistic regression, extreme gradient boosting, and random forests. The models are evaluated using 10-fold cross-validation during the training phase, repeated five times to ensure robust validation. Our findings reveal that extreme gradient boosting and decision trees exhibit the highest prediction accuracy on the testing set, achieving 93.3% accuracy, followed closely by random forests with 91.5% accuracy. In contrast, the support vector machine with a linear kernel performs less favorably, showing a prediction accuracy of 84.8% and underperforming in specificity analysis. Notably, the integration of both climatic and non-climatic features proves to be a crucial factor in accurately predicting malaria outbreaks in The Gambia.

MeSH terms

  • Algorithms
  • Disease Outbreaks*
  • Gambia / epidemiology
  • Humans
  • Machine Learning*
  • Malaria* / epidemiology
  • Neural Networks, Computer
  • Support Vector Machine*

Grants and funding

This work was supported by the Deanship of Research Oversight and Coordination at King Fahd University of Petroleum and Minerals. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.