Speech as a Biomarker for COVID-19 Detection Using Machine Learning

Mohammed Usman; Vinit Kumar Gunjan; Mohd Wajid; Mohammed Zubair; Kazy Noor-E-Alam Siddiquee

doi:10.1155/2022/6093613

Speech as a Biomarker for COVID-19 Detection Using Machine Learning

Comput Intell Neurosci. 2022 Apr 18:2022:6093613. doi: 10.1155/2022/6093613. eCollection 2022.

Authors

Mohammed Usman¹, Vinit Kumar Gunjan², Mohd Wajid³, Mohammed Zubair¹, Kazy Noor-E-Alam Siddiquee⁴

Affiliations

¹ Department of Electrical Engineering, King Khalid University, Abha 61411, Saudi Arabia.
² Department of Computer Science and Engineering, CMR Institute of Technology, Hyderabad, India.
³ Department of Electronics Engineering, ZHCET, Aligarh Muslim University, Aligarh 202002, India.
⁴ Department of Computer Science and Engineering, University of Science and Technology, Chittagong, Bangladesh.

Abstract

The use of speech as a biomedical signal for diagnosing COVID-19 is investigated using statistical analysis of speech spectral features and classification algorithms based on machine learning. It is established that spectral features of speech, obtained by computing the short-time Fourier Transform (STFT), get altered in a statistical sense as a result of physiological changes. These spectral features are then used as input features to machine learning-based classification algorithms to classify them as coming from a COVID-19 positive individual or not. Speech samples from healthy as well as "asymptomatic" COVID-19 positive individuals have been used in this study. It is shown that the RMS error of statistical distribution fitting is higher in the case of speech samples of COVID-19 positive speech samples as compared to the speech samples of healthy individuals. Five state-of-the-art machine learning classification algorithms have also been analyzed, and the performance evaluation metrics of these algorithms are also presented. The tuning of machine learning model parameters is done so as to minimize the misclassification of COVID-19 positive individuals as being COVID-19 negative since the cost associated with this misclassification is higher than the opposite misclassification. The best performance in terms of the "recall" metric is observed for the Decision Forest algorithm which gives a recall value of 0.7892.

MeSH terms

Algorithms
Biomarkers
COVID-19* / diagnosis
Humans
Machine Learning
Speech*

Substances

Biomarkers