Using interpretability approaches to update "black-box" clinical prediction models: an external validation study in nephrology

Harry Freitas da Cruz; Boris Pfahringer; Tom Martensen; Frederic Schneider; Alexander Meyer; Erwin Böttinger; Matthieu-P Schapranow

doi:10.1016/j.artmed.2020.101982

Using interpretability approaches to update "black-box" clinical prediction models: an external validation study in nephrology

Artif Intell Med. 2021 Jan:111:101982. doi: 10.1016/j.artmed.2020.101982. Epub 2020 Nov 7.

Authors

Harry Freitas da Cruz¹, Boris Pfahringer², Tom Martensen³, Frederic Schneider³, Alexander Meyer², Erwin Böttinger⁴, Matthieu-P Schapranow³

Affiliations

¹ Digital Health Center, Hasso Plattner Institute, University of Potsdam, Prof.-Dr.- Helmert-Str. 2-3, 14482 Potsdam, Germany; Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. Electronic address: Harry.FreitasDaCruz@hpi.de.
² German Heart Center Berlin, Department of Cardiothoracic and Vascular Surgery, Augustenburger Platz 1, 13353 Berlin, Germany.
³ Digital Health Center, Hasso Plattner Institute, University of Potsdam, Prof.-Dr.- Helmert-Str. 2-3, 14482 Potsdam, Germany.
⁴ Digital Health Center, Hasso Plattner Institute, University of Potsdam, Prof.-Dr.- Helmert-Str. 2-3, 14482 Potsdam, Germany; Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

PMID: 33461682
DOI: 10.1016/j.artmed.2020.101982

Abstract

Despite advances in machine learning-based clinical prediction models, only few of such models are actually deployed in clinical contexts. Among other reasons, this is due to a lack of validation studies. In this paper, we present and discuss the validation results of a machine learning model for the prediction of acute kidney injury in cardiac surgery patients initially developed on the MIMIC-III dataset when applied to an external cohort of an American research hospital. To help account for the performance differences observed, we utilized interpretability methods based on feature importance, which allowed experts to scrutinize model behavior both at the global and local level, making it possible to gain further insights into why it did not behave as expected on the validation cohort. The knowledge gleaned upon derivation can be potentially useful to assist model update during validation for more generalizable and simpler models. We argue that interpretability methods should be considered by practitioners as a further tool to help explain performance differences and inform model update in validation studies.

Keywords: Clinical predictive modeling; Interpretability methods; Nephrology; Validation.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Acute Kidney Injury* / diagnosis
Cohort Studies
Hospitals
Humans
Machine Learning
Nephrology*

Grants and funding

S10 OD026880/OD/NIH HHS/United States