An illustration of model agnostic explainability methods applied to environmental data

Christopher K Wikle; Abhirup Datta; Bhava Vyasa Hari; Edward L Boone; Indranil Sahoo; Indulekha Kavila; Stefano Castruccio; Susan J Simmons; Wesley S Burr; Won Chang

doi:10.1002/env.2772

An illustration of model agnostic explainability methods applied to environmental data

Environmetrics. 2023 Feb;34(1):e2772. doi: 10.1002/env.2772. Epub 2022 Oct 25.

Authors

Affiliations

¹ Department of Statistics, University of Missouri, Columbia, Missouri, USA.
² Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA.
³ Wipro Limited, Bengaluru, India.
⁴ Department of Statistical Sciences and Operations Research, Virginia Commonwealth University, Richmond, Virginia, USA.
⁵ School of Pure and Applied Physics, Mahatma Gandhi University, Athirampuzha, Kerala, India.
⁶ Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, Indiana, USA.
⁷ Institute for Advanced Analytics, North Carolina State University, Raleigh, North Carolina, USA.
⁸ Department of Mathematics, Trent University, Peterborough, Ontario, Canada.
⁹ Department of Mathematical Sciences, University of Cincinnati, Cincinnati, Ohio, USA.

Abstract

Historically, two primary criticisms statisticians have of machine learning and deep neural models is their lack of uncertainty quantification and the inability to do inference (i.e., to explain what inputs are important). Explainable AI has developed in the last few years as a sub-discipline of computer science and machine learning to mitigate these concerns (as well as concerns of fairness and transparency in deep modeling). In this article, our focus is on explaining which inputs are important in models for predicting environmental data. In particular, we focus on three general methods for explainability that are model agnostic and thus applicable across a breadth of models without internal explainability: "feature shuffling", "interpretable local surrogates", and "occlusion analysis". We describe particular implementations of each of these and illustrate their use with a variety of models, all applied to the problem of long-lead forecasting monthly soil moisture in the North American corn belt given sea surface temperature anomalies in the Pacific Ocean.

Keywords: LIME; Shapley values; explainable AI; feature shuffling; machine learning.

Grants and funding

R01 ES033739/ES/NIEHS NIH HHS/United States