Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models: Application of the simulated annealing feature selection method

Farzaneh Sajedi Hosseini; Bahram Choubin; Amir Mosavi; Narjes Nabipour; Shahaboddin Shamshirband; Hamid Darabi; Ali Torabi Haghighi

doi:10.1016/j.scitotenv.2019.135161

Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models: Application of the simulated annealing feature selection method

Sci Total Environ. 2020 Apr 1:711:135161. doi: 10.1016/j.scitotenv.2019.135161. Epub 2019 Nov 21.

Authors

Farzaneh Sajedi Hosseini¹, Bahram Choubin², Amir Mosavi³, Narjes Nabipour⁴, Shahaboddin Shamshirband⁵, Hamid Darabi⁶, Ali Torabi Haghighi⁶

Affiliations

¹ Department of Reclamation of Arid and Mountainous Regions, Faculty of Natural Resources, University of Tehran, Karaj, Iran.
² Soil Conservation and Watershed Management Research Department, West Azarbaijan Agricultural and Natural Resources Research and Education Center, AREEO, Urmia, Iran.
³ School of the Built Environment, Oxford Brookes University, Oxford OX3 0BP, UK; Kalman Kando Faculty of Electrical Engineering, Obuda University, Budapest, Hungary.
⁴ Institute of Research and Development, Duy Tan University, Da Nang 550000, Viet Nam. Electronic address: narjesnabipour@duytan.edu.vn.
⁵ Department for Management of Science and Technology Development, Ton Duc Thang University, Ho Chi Minh City, Viet Nam; Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Viet Nam. Electronic address: shahaboddin.shamshirband@tdtu.edu.vn.
⁶ Water, Energy and Environmental Engineering Research Unit, University of Oulu, P.O. Box 4300, FIN-90014 Oulu, Finland.

PMID: 31818576
DOI: 10.1016/j.scitotenv.2019.135161

Abstract

Flash-floods are increasingly recognized as a frequent natural hazard worldwide. Iran has been among the mostdevastated regions affected by the major floods. While the temporal flash-flood forecasting models are mainly developed for warning systems, the models for assessing hazardous areas can greatly contribute to adaptation and mitigation policy-making and disaster risk reduction. Former researches in the flash-flood hazard mapping have heightened the urge for the advancement of more accurate models. Thus, the current research proposes the state-of-the-art ensemble models of boosted generalized linear model (GLMBoost) and random forest (RF), and Bayesian generalized linear model (BayesGLM) methods for higher performance modeling. Furthermore, a pre-processing method, namely simulated annealing (SA), is used to eliminate redundant variables from the modeling process. Results of the modeling based on the hit and miss analysis indicates high performance for both models (accuracy = 90-92%, Kappa = 79-84%, Success ratio = 94-96%, Threat score = 80-84%, and Heidke skill score = 79-84%). The variables of distance from the stream, vegetation, drainage density, land use, and elevation have shown more contribution among others for modeling the flash-flood. The results of this study can significantly facilitate mapping the hazardous areas and further assist watershed managers to control and remediate induced damages of flood in the data-scarce regions.

Keywords: Bayesian; Ensemble machine learning; Flash-flood; Hazard; Simulated annealing.