Applying XGBoost and SHAP to Open Source Data to Identify Key Drivers and Predict Likelihood of Wolf Pair Presence

Environ Manage. 2024 May;73(5):1072-1087. doi: 10.1007/s00267-024-01941-1. Epub 2024 Feb 19.

Abstract

Wolves have returned to Germany since 2000. Numbers have grown to 209 territorial pairs in 2021. XGBoost machine learning, combined with SHAP analysis is applied to predict German wolf pair presence in 2022 for 10 × 10 km grid cells. Model input consisted of 38 variables from open sources, covering the period 2000 to 2021. The XGBoost model predicted well, with 0.91 as the AUC. SHAP analysis ranked the variables: distance to the closest neighboring wolf pair was the main driver for a grid cell to become occupied by a wolf pair. The clustering tendency of related wolves seems to be an important explanatory factor here. Second was the percentage of wooded area. The next eight variables related to wolf presence in the preceding year, except at fifth, eighth and tenth position in the total order: human density (square root) in the grid, percentage arable land and road density respectively. Other variables including the occurrence of wild prey were the weakest predictors. The SHAP analysis also provided crucial added value in identifying a variable that had threshold values where its contribution to the prediction changed from positive to negative or vice versa. For instance, low density of people increased the probability of wolf pair presence, whereas a high density decreased this probability. Cumulative lift techniques showed that the model performed almost four times better than random prediction. The combination of XGBoost, SHAP and cumulative lift techniques is new in wolf management and conservation, allowing for the focusing of educational and financial resources.

Keywords: Ecological model; Geospatial analysis; Machine learning model; SHAP; Wolf; XGBoost.

MeSH terms

  • Animals
  • Germany
  • Humans
  • Probability
  • Wolves*