Flow cytometry datasets consisting of peripheral blood and bone marrow samples for the evaluation of explainable artificial intelligence methods

Michael C Thrun; Jörg Hoffmann; Maximilian Röhnert; Malte von Bonin; Uta Oelschlägel; Cornelia Brendel; Alfred Ultsch

doi:10.1016/j.dib.2022.108382

Flow cytometry datasets consisting of peripheral blood and bone marrow samples for the evaluation of explainable artificial intelligence methods

Data Brief. 2022 Jun 17:43:108382. doi: 10.1016/j.dib.2022.108382. eCollection 2022 Aug.

Authors

Michael C Thrun^{1

2}, Jörg Hoffmann², Maximilian Röhnert³, Malte von Bonin³, Uta Oelschlägel³, Cornelia Brendel², Alfred Ultsch¹

Affiliations

¹ Databionics, Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Straße 6 D-35032 Marburg, Germany.
² J Department of Hematology, Oncology and Immunology, Philipps-University, Baldinger Str., D-35032 Marburg, Germany.
³ Medizinische Klinik und Poliklinik I Bereich Innere Medizin / Hämatologie und Onkologie, Universitätsklinikum Carl Gustav Carus Dresden, Germany.

Abstract

Three different Flow Cytometry datasets consisting of diagnostic samples of either peripheral blood (pB) or bone marrow (BM) from patients without any sign of bone marrow disease at two different health care centers are provided. In Flow Cytometry, each cell rapidly passes through a laser beam one by one, and two light scatter, and eight surface parameters of more than 100.000 cells are measured per sample of each patient. The technology swiftly characterizes cells of the immune system at the single-cell level based on antigens presented on the cell surface that are targeted by a set of fluorochrome-conjugated antibodies. The first dataset consists of N=14 sample files measured in Marburg and the second dataset of N=44 data files measured in Dresden, of which half are BM samples and half are pB samples. The third dataset contains N=25 healthy bone marrow samples and N=25 leukemia bone marrow samples measured in Marburg. The data has been scaled to log between zero and six and used to identify cell populations that are simultaneously meaningful to the clinician and relevant to the distinction of pB vs BM, and BM vs leukemia. Explainable artificial intelligence methods should distinguish these samples and provide meaningful explanations for the classification without taking more than several hours to compute their results. The data described in this article are available in Mendeley Data [1].

Keywords: Benchmarking; Cell populations; Explainable artificial intelligence; Flow cytometry; Human blood; Human bone marrow; Immunophenotyping; Interpretable machine learning.