Explainable artificial intelligence in forensics: Realistic explanations for number of contributor predictions of DNA profiles

Marthe S Veldhuis; Simone Ariëns; Rolf J F Ypma; Thomas Abeel; Corina C G Benschop

doi:10.1016/j.fsigen.2021.102632

Explainable artificial intelligence in forensics: Realistic explanations for number of contributor predictions of DNA profiles

Forensic Sci Int Genet. 2022 Jan:56:102632. doi: 10.1016/j.fsigen.2021.102632. Epub 2021 Nov 21.

Authors

Marthe S Veldhuis¹, Simone Ariëns², Rolf J F Ypma³, Thomas Abeel⁴, Corina C G Benschop⁵

Affiliations

¹ Delft University of Technology, Mekelweg 5, 2628 CD Delft, The Netherlands; Netherlands Forensic Institute, Division of Digital and Biometric Traces, Laan van Ypenburg 6, 2497GB The Hague, The Netherlands. Electronic address: msveldhuis96@gmail.com.
² Netherlands Forensic Institute, Division of Digital and Biometric Traces, Laan van Ypenburg 6, 2497GB The Hague, The Netherlands. Electronic address: s.ariens@nfi.nl.
³ Netherlands Forensic Institute, Division of Digital and Biometric Traces, Laan van Ypenburg 6, 2497GB The Hague, The Netherlands. Electronic address: r.ypma@nfi.nl.
⁴ Delft University of Technology, Mekelweg 5, 2628 CD Delft, The Netherlands. Electronic address: c.benschop@nfi.nl.
⁵ Netherlands Forensic Institute, Division of Biological Traces, Laan van Ypenburg 6, 2497GB The Hague, The Netherlands. Electronic address: t.abeel@tudelft.nl.

PMID: 34839075
DOI: 10.1016/j.fsigen.2021.102632

Abstract

Machine learning obtains good accuracy in determining the number of contributors (NOC) in short tandem repeat (STR) mixture DNA profiles. However, the models used so far are not understandable to users as they only output a prediction without any reasoning for that conclusion. Therefore, we leverage techniques from the field of explainable artificial intelligence (XAI) to help users understand why specific predictions are made. Where previous attempts at explainability for NOC estimation have relied upon using simpler, more understandable models that achieve lower accuracy, we use techniques that can be applied to any machine learning model. Our explanations incorporate SHAP values and counterfactual examples for each prediction into a single visualization. Existing methods for generating counterfactuals focus on uncorrelated features. This makes them inappropriate for the highly correlated features derived from STR data for NOC estimation, as these techniques simulate combinations of features that could not have resulted from an STR profile. For this reason, we have constructed a new counterfactual method, Realistic Counterfactuals (ReCo), which generates realistic counterfactual explanations for correlated data. We show that ReCo outperforms state-of-the-art methods on traditional metrics, as well as on a novel realism score. A user evaluation of the visualization shows positive opinions of end-users, which is ultimately the most appropriate metric in assessing explanations for real-world settings.

Keywords: Counterfactual explanations; DNA mixtures; Explainable artificial intelligence; Machine learning; Number of contributors.

MeSH terms

Artificial Intelligence*
DNA / genetics
Forensic Medicine
Humans
Machine Learning*

Substances

DNA