A comparative study of heterogeneous and homogeneous ensemble approaches for landslide susceptibility assessment in the Djebahia region, Algeria

Environ Sci Pollut Res Int. 2023 Mar 9. doi: 10.1007/s11356-023-26247-3. Online ahead of print.

Abstract

This study aims to compare the performance of ensembles according to their inherent diversity in the context of landslide susceptibility assessment. Heterogeneous and homogeneous ensemble types can be distinguished; four ensembles of each approach were implemented in the Djebahia region. The heterogeneous ensembles include stacking (ST), voting (VO), weighting (WE), and a new approach in landslide assessment called meta-dynamic ensemble selection (DES), while the homogeneous ensembles include AdaBoost (ADA), bagging (BG), random forest (RF), and random subspace (RSS). To ensure a consistent comparison, each ensemble was implemented using individual base learners. The heterogeneous ensembles were generated by combining eight different machine learning algorithms, while the homogeneous ensembles only used a single base learner, with diversity achieved through resampling the training dataset. The spatial dataset used in this study consisted of 115 landslide events and 12 conditioning factors, which were randomly divided into training and testing datasets. The models were evaluated through various aspects, including receiver operating characteristic (ROC) curves, root mean squared error (RMSE), landslide density distribution (LDD), threshold-dependent metrics (Kappa index, accuracy, and recall scores), and a global visual representation using the Taylor diagram. Additionally, a sensitivity analysis (SA) was conducted for the best performing models to assess the importance of the factors and the resilience of the ensembles. The results revealed that homogeneous ensembles outperformed heterogeneous ensembles in terms of AUC and threshold-dependent metrics, with AUC ranging from 0.962 to 0.971 for the test dataset. ADA was the best performing model for these metrics and the least in terms of RMSE (0.366). However, the heterogeneous ensemble ST provided a finer RMSE (0.272), and DES showed the best LDD, indicating a stronger potential to generalize the phenomenon. The Taylor diagram was consistent with the other results, indicating that ST was the best performing model, followed by RSS. The SA demonstrated that RSS was the most robust (mean AUC variation of - 0.022) and ADA was the least robust (mean AUC variation of - 0.038).

Keywords: Djebahia Algeria; Heterogeneous; Homogenous ensembles; Landslide susceptibility mapping; Machine learning.