Explainable COVID-19 detection using fractal dimension and vision transformer with Grad-CAM on cough sounds

Nebras Sobahi; Orhan Atila; Erkan Deniz; Abdulkadir Sengur; U Rajendra Acharya

doi:10.1016/j.bbe.2022.08.005

Explainable COVID-19 detection using fractal dimension and vision transformer with Grad-CAM on cough sounds

Biocybern Biomed Eng. 2022 Jul-Sep;42(3):1066-1080. doi: 10.1016/j.bbe.2022.08.005. Epub 2022 Sep 6.

Authors

Nebras Sobahi¹, Orhan Atila², Erkan Deniz², Abdulkadir Sengur², U Rajendra Acharya^{3

4

5}

Affiliations

¹ King Abdulaziz University, Department of Electrical and Computer Engineering, Jeddah, Saudi Arabia.
² Firat University, Technology Faculty, Electrical and Electronics Engineering Department, Elazig, Turkey.
³ Ngee Ann Polytechnic, Department of Electronics and Computer Engineering, 599489, Singapore.
⁴ Biomedical Engineering, School of Science and Technology, SUSS University, Singapore.
⁵ Biomedical Informatics and Medical Engineering, Asia University, Taichung, Taiwan.

Abstract

The polymerase chain reaction (PCR) test is not only time-intensive but also a contact method that puts healthcare personnel at risk. Thus, contactless and fast detection tests are more valuable. Cough sound is an important indicator of COVID-19, and in this paper, a novel explainable scheme is developed for cough sound-based COVID-19 detection. In the presented work, the cough sound is initially segmented into overlapping parts, and each segment is labeled as the input audio, which may contain other sounds. The deep Yet Another Mobile Network (YAMNet) model is considered in this work. After labeling, the segments labeled as cough are cropped and concatenated to reconstruct the pure cough sounds. Then, four fractal dimensions (FD) calculation methods are employed to acquire the FD coefficients on the cough sound with an overlapped sliding window that forms a matrix. The constructed matrixes are then used to form the fractal dimension images. Finally, a pretrained vision transformer (ViT) model is used to classify the constructed images into COVID-19, healthy and symptomatic classes. In this work, we demonstrate the performance of the ViT on cough sound-based COVID-19, and a visual explainability of the inner workings of the ViT model is shown. Three publically available cough sound datasets, namely COUGHVID, VIRUFY, and COSWARA, are used in this study. We have obtained 98.45%, 98.15%, and 97.59% accuracy for COUGHVID, VIRUFY, and COSWARA datasets, respectively. Our developed model obtained the highest performance compared to the state-of-the-art methods and is ready to be tested in real-world applications.

Keywords: COVID-19 detection; Cough sound; Fractal dimension; Vision Transformer; YAMNet.