QCovSML: A reliable COVID-19 detection system using CBC biomarkers by a stacking machine learning model

Tawsifur Rahman; Amith Khandakar; Farhan Fuad Abir; Md Ahasan Atick Faisal; Md Shafayet Hossain; Kanchon Kanti Podder; Tariq O Abbas; Mohammed Fasihul Alam; Saad Bin Kashem; Mohammad Tariqul Islam; Susu M Zughaier; Muhammad E H Chowdhury

doi:10.1016/j.compbiomed.2022.105284

QCovSML: A reliable COVID-19 detection system using CBC biomarkers by a stacking machine learning model

Comput Biol Med. 2022 Apr:143:105284. doi: 10.1016/j.compbiomed.2022.105284. Epub 2022 Feb 12.

Affiliations

¹ Department of Electrical Engineering, Qatar University, Doha, 2713, Qatar.
² Department of Electrical and Electronics Engineering, University of Dhaka, Dhaka, 1000, Bangladesh.
³ Dept. of Electrical, Electronics and Systems Engineering, Universiti Kebangsaan Malaysia, Bangi, Selangor, 43600, Malaysia.
⁴ Department of Biomedical Physics & Technology, University of Dhaka, Dhaka, 1000, Bangladesh.
⁵ Urology Division, Surgery Department, Sidra Medicine, Doha, 26999, Qatar.
⁶ Department of Public Health, College of Health Sciences, QU Health, Qatar University, Doha, 2713, Qatar.
⁷ Department of Computing Science, AFG College with the University of Aberdeen, Doha, Qatar.
⁸ Department of Basic Medical Sciences, College of Medicine, QU Health, Qatar University, Doha, 2713, Qatar.
⁹ Department of Electrical Engineering, Qatar University, Doha, 2713, Qatar. Electronic address: mchowdhury@qu.edu.qa.

Abstract

The reverse transcription-polymerase chain reaction (RT-PCR) test is considered the current gold standard for the detection of coronavirus disease (COVID-19), although it suffers from some shortcomings, namely comparatively longer turnaround time, higher false-negative rates around 20-25%, and higher cost equipment. Therefore, finding an efficient, robust, accurate, and widely available, and accessible alternative to RT-PCR for COVID-19 diagnosis is a matter of utmost importance. This study proposes a complete blood count (CBC) biomarkers-based COVID-19 detection system using a stacking machine learning (SML) model, which could be a fast and less expensive alternative. This study used seven different publicly available datasets, where the largest one consisting of fifteen CBC biomarkers collected from 1624 patients (52% COVID-19 positive) admitted at San Raphael Hospital, Italy from February to May 2020 was used to train and validate the proposed model. White blood cell count, monocytes (%), lymphocyte (%), and age parameters collected from the patients during hospital admission were found to be important biomarkers for COVID-19 disease prediction using five different feature selection techniques. Our stacking model produced the best performance with weighted precision, sensitivity, specificity, overall accuracy, and F1-score of 91.44%, 91.44%, 91.44%, 91.45%, and 91.45%, respectively. The stacking machine learning model improved the performance in comparison to other state-of-the-art machine learning classifiers. Finally, a nomogram-based scoring system (QCovSML) was constructed using this stacking approach to predict the COVID-19 patients. The cut-off value of the QCovSML system for classifying COVID-19 and Non-COVID patients was 4.8. Six datasets from three different countries were used to externally validate the proposed model to evaluate its generalizability and robustness. The nomogram demonstrated good calibration and discrimination with the area under the curve (AUC) of 0.961 for the internal cohort and average AUC of 0.967 for all external validation cohort, respectively. The external validation shows an average weighted precision, sensitivity, F1-score, specificity, and overall accuracy of 92.02%, 95.59%, 93.73%, 90.54%, and 93.34%, respectively.

Keywords: COVID-19; Complete blood count (CBC); Detection; RT-PCR; Stacking machine learning.