Predicting the Olea pollen concentration with a machine learning algorithm ensemble

Int J Biometeorol. 2021 Apr;65(4):541-554. doi: 10.1007/s00484-020-02047-z. Epub 2020 Nov 13.

Abstract

Air pollution in large cities produces numerous diseases and even millions of deaths annually according to the World Health Organization. Pollen exposure is related to allergic diseases, which makes its prediction a valuable tool to assess the risk level to aeroallergens. However, airborne pollen concentrations are difficult to predict due to the inherent complexity of the relationships among both biotic and environmental variables. In this work, a stochastic approach based on supervised machine learning algorithms was performed to forecast the daily Olea pollen concentrations in the Community of Madrid, central Spain, from 1993 to 2018. Firstly, individual Light Gradient Boosting Machine (LightGBM) and artificial neural network (ANN) models were applied to predict the day of the year (DOY) when the peak of the pollen season occurs, resulting the estimated average peak date 149.1 ± 9.3 and 150.1 ± 10.8 DOY for LightGBM and ANN, respectively, close to the observed value (148.8 ± 9.8). Secondly, the daily pollen concentrations during the entire pollen season have been calculated using an ensemble of two-step GAM followed by LightGBM and ANN. The results of the prediction of daily pollen concentrations showed a coefficient of determination (r2) above 0.75 (goodness of the model following cross-validation). The predictors included in the ensemble models were meteorological variables, phenological metrics, specific site-characteristics, and preceding pollen concentrations. The models are state-of-the-art in machine learning and their potential has been shown to be used and deployed to understand and to predict the pollen risk levels during the main olive pollen season.

Keywords: Air quality; Boosted trees; Neural networks; Pollen exposure; Pollen prediction.

MeSH terms

  • Air Pollutants* / analysis
  • Allergens / analysis
  • Environmental Monitoring
  • Machine Learning
  • Olea*
  • Pollen / chemistry
  • Seasons
  • Spain

Substances

  • Air Pollutants
  • Allergens