An interpretable machine learning method for supporting ecosystem management: Application to species distribution models of freshwater macroinvertebrates

J Environ Manage. 2021 Aug 1:291:112719. doi: 10.1016/j.jenvman.2021.112719. Epub 2021 May 1.

Abstract

Species distribution models (SDMs), in which species occurrences are related to a suite of environmental variables, have been used as a decision-making tool in ecosystem management. Complex machine learning (ML) algorithms that lack interpretability may hinder the use of SDMs for ecological explanations, possibly limiting the role of SDMs as a decision-support tool. To meet the growing demand of explainable MLs, several interpretable ML methods have recently been proposed. Among these methods, SHaply Additive exPlanation (SHAP) has drawn attention for its robust theoretical justification and analytical gains. In this study, the utility of SHAP was demonstrated by the application of SDMs of four benthic macroinvertebrate species. In addition to species responses, the dataset contained 22 environmental variables monitored at 436 sites across five major rivers of South Korea. A range of ML algorithms was employed for model development. Each ML model was trained and optimized using 10-fold cross-validation. Model evaluation based on the test dataset indicated strong model performance, with an accuracy of ≥0.7 in all evaluation metrics for all MLs and species. However, only the random forest algorithm showed a behavior consistent with the known ecology of the investigated species. SHAP presents an integrated framework in which local interpretations that incorporate local interaction effects are combined to represent the global model structure. Consequently, this framework offered a novel opportunity to assess the importance of variables in predicting species occurrence, not only across sites, but also for individual sites. Furthermore, removing interaction effects from variable importance values (SHAP values) clearly revealed non-linear species responses to variations in environmental variables, indicating the existence of ecological thresholds. This study provides guidelines for the use of a new interpretable method supporting ecosystem management.

Keywords: EPT taxa; Interpretable machine learning; Macroinvertebrate; SHAP; Species distribution model; Tree-based model.

MeSH terms

  • Ecosystem*
  • Fresh Water
  • Machine Learning*
  • Republic of Korea
  • Rivers