Flow cytometry datasets consisting of peripheral blood and bone marrow samples for the evaluation of explainable artificial intelligence methods

Data Brief. 2022 Jun 17:43:108382. doi: 10.1016/j.dib.2022.108382. eCollection 2022 Aug.

Abstract

Three different Flow Cytometry datasets consisting of diagnostic samples of either peripheral blood (pB) or bone marrow (BM) from patients without any sign of bone marrow disease at two different health care centers are provided. In Flow Cytometry, each cell rapidly passes through a laser beam one by one, and two light scatter, and eight surface parameters of more than 100.000 cells are measured per sample of each patient. The technology swiftly characterizes cells of the immune system at the single-cell level based on antigens presented on the cell surface that are targeted by a set of fluorochrome-conjugated antibodies. The first dataset consists of N=14 sample files measured in Marburg and the second dataset of N=44 data files measured in Dresden, of which half are BM samples and half are pB samples. The third dataset contains N=25 healthy bone marrow samples and N=25 leukemia bone marrow samples measured in Marburg. The data has been scaled to log between zero and six and used to identify cell populations that are simultaneously meaningful to the clinician and relevant to the distinction of pB vs BM, and BM vs leukemia. Explainable artificial intelligence methods should distinguish these samples and provide meaningful explanations for the classification without taking more than several hours to compute their results. The data described in this article are available in Mendeley Data [1].

Keywords: Benchmarking; Cell populations; Explainable artificial intelligence; Flow cytometry; Human blood; Human bone marrow; Immunophenotyping; Interpretable machine learning.