Parameter importance assessment improves efficacy of machine learning methods for predicting snow avalanche sites in Leh-Manali Highway, India

Sci Total Environ. 2021 Nov 10:794:148738. doi: 10.1016/j.scitotenv.2021.148738. Epub 2021 Jun 29.

Abstract

Due to ongoing climate change, water mass redistribution and related hazards are getting stronger and frequent. Therefore, predicting extreme hydrological events and related hazards is one of the highest priorities in geosciences. Machine Learning (ML) methods have shown promising prospects in this venture. Every ML method requires training where we know both the output (extreme event) and input (relevant physical parameters and variables). This step is critical to the efficacy of the ML method. The usual approach is to include a wide variety of hydro-meteorological observations and physical parameters, but recent advances in ML indicate that the efficacy of ML may not improve by increasing the number of input parameters. In fact, including unimportant parameters decreases the efficacy of ML algorithms. Therefore, it is imperative that the most relevant parameters are identified prior to training. In this study, we demonstrate this concept by predicting avalanche susceptibility in Leh-Manali highway (one of the most severely affected regions in India) with and without Parameter Importance Assessment (PIA). The avalanche locations were randomly divided into two groups: 70% for training and 30% for testing. Then, based on temporal and spatial sensor data, eleven avalanche influencing parameters were considered. The Boruta algorithm, an extension of Random Forest (RF) ML method that utilizes the importance measure to rank predictors, was used and it found nine out of eleven parameters to be important. Support Vector Machine (SVM) based ML technique is used for avalanche prediction, and to be comprehensive, four different kernel functions were employed (linear, polynomial, sigmoid, and radial basis function (RBF)). The prediction accuracy for linear, polynomial, sigmoid, and RBF kernels, with all the eleven parameters were found to be 80.4%, 81.7%, 39.2%, and 85.7%, respectively. While, when using selected parameters, the prediction accuracy for linear, polynomial, sigmoid, and RBF kernels were 84.1%, 86.6%, 43.0%, and 87.8%, respectively. We also identified locations where occurrences of avalanches are most likely. We conclude that parameter selection should be considered when applying ML methods in geosciences.

Keywords: Avalanche susceptibility modeling; Boruta algorithm; Machine learning (ML); Parameter Importance Assessment (PIA); Support Vector Machine (SVM).

MeSH terms

  • Algorithms
  • Avalanches*
  • India
  • Machine Learning
  • Snow
  • Support Vector Machine