Prediction of meteorological drought and standardized precipitation index based on the random forest (RF), random tree (RT), and Gaussian process regression (GPR) models

Environ Sci Pollut Res Int. 2023 Mar;30(15):43183-43202. doi: 10.1007/s11356-023-25221-3. Epub 2023 Jan 17.

Abstract

Agriculture, meteorological, and hydrological drought is a natural hazard which affects ecosystems in the central India of Maharashtra state. Due to limited historical data for drought monitoring and forecasting available in the central India of Maharashtra state, implementing machine learning (ML) algorithms could allow for the prediction of future drought events. In this paper, we have focused on the prediction accuracy of meteorological drought in the semi-arid region based on the standardized precipitation index (SPI) using the random forest (RF), random tree (RT), and Gaussian process regression (GPR-PUK kernel) models. A different combination of machine learning models and variables has been performed for the forecasting of metrological drought based on the SPI-6 and 12 months. Models were developed using monthly rainfall data for the period of 2000-2019 at two meteorological stations, namely, Karanjali and Gangawdi, each representing a geographical region of Upper Godavari river basin area in the central India of Maharashtra state which frequently experiences droughts. Historical data from the SPI from 2000 to 2013 was processed to train the model into machine learning model, and the rest of the 2014 to 2019-year data were used for testing to forecast the SPI and metrological drought. The mean square error (MSE), root mean square error (RMSE), adjusted R2, Mallows' (Cp), Akaike's (AIC), Schwarz's (SBC), and Amemiya's PC were used to identify the best combination input model and best subregression analysis for both stations of SPI-6 and 12. The correlation coefficient ([Formula: see text]), mean absolute error (MAE), root mean square error (RMSE), relative absolute error (RAE), and root relative squared error (RRSE) were used to perform evaluation for SPI-6 and 12 months of both stations with RF, RT, and GPR-PUK kernel models during the training and testing scenarios. The results during testing phase revealed that the RF was found as the best model in forecasting droughts with values of [Formula: see text], MAE, RMSE, RAE (%), and RRSE (%) being 0.856, 0.551, 0.718, 74.778, and 54.019, respectively, for SPI-6 while 0.961, 0.361, 0.538, 34.926, and 28.262, respectively, for SPI-12 scales at Gangawdi station. Further, the respective values of evaluators at Karanjali station were 0.913 and 0.966, 0.541 and 0.386, 0.604 and 0.589, 52.592 and 36.959, and 42.315 and 31.394 for PUK kernel and RT models, respectively, during SPI-6 and SPI-12. Machine learning models are potential drought warning techniques because they take less time, have fewer inputs, and are less sophisticated than dynamic or scientific models.

Keywords: Drought; Machine learning; Random forest (RF); Standardized precipitation index (SPI).

MeSH terms

  • Algorithms
  • Droughts*
  • Ecosystem
  • India
  • Random Forest*