An Explainable Vision Transformer Model Based White Blood Cells Classification and Localization

Oguzhan Katar; Ozal Yildirim

doi:10.3390/diagnostics13142459

An Explainable Vision Transformer Model Based White Blood Cells Classification and Localization

Diagnostics (Basel). 2023 Jul 24;13(14):2459. doi: 10.3390/diagnostics13142459.

Authors

Oguzhan Katar¹, Ozal Yildirim^{1

2}

Affiliations

¹ Department of Software Engineering, Firat University, Elazig 23119, Turkey.
² Department of Artificial Intelligence and Data Engineering, Firat University, Elazig 23119, Turkey.

Abstract

White blood cells (WBCs) are crucial components of the immune system that play a vital role in defending the body against infections and diseases. The identification of WBCs subtypes is useful in the detection of various diseases, such as infections, leukemia, and other hematological malignancies. The manual screening of blood films is time-consuming and subjective, leading to inconsistencies and errors. Convolutional neural networks (CNN)-based models can automate such classification processes, but are incapable of capturing long-range dependencies and global context. This paper proposes an explainable Vision Transformer (ViT) model for automatic WBCs detection from blood films. The proposed model uses a self-attention mechanism to extract features from input images. Our proposed model was trained and validated on a public dataset of 16,633 samples containing five different types of WBCs. As a result of experiments on the classification of five different types of WBCs, our model achieved an accuracy of 99.40%. Moreover, the model's examination of misclassified test samples revealed a correlation between incorrect predictions and the presence or absence of granules in the cell samples. To validate this observation, we divided the dataset into two classes, Granulocytes and Agranulocytes, and conducted a secondary training process. The resulting ViT model, trained for binary classification, achieved impressive performance metrics during the test phase, including an accuracy of 99.70%, recall of 99.54%, precision of 99.32%, and F-1 score of 99.43%. To ensure the reliability of the ViT model's, we employed the Score-CAM algorithm to visualize the pixel areas on which the model focuses during its predictions. Our proposed method is suitable for clinical use due to its explainable structure as well as its superior performance compared to similar studies in the literature. The classification and localization of WBCs with this model can facilitate the detection and reporting process for the pathologist.

Keywords: Score-CAM; deep learning; explainable AI models; vision transformers; white blood cells.

Grants and funding

This research received no external funding.