External validation of Machine Learning models for COVID-19 detection based on Complete Blood Count

Andrea Campagner; Anna Carobene; Federico Cabitza

doi:10.1007/s13755-021-00167-3

External validation of Machine Learning models for COVID-19 detection based on Complete Blood Count

Health Inf Sci Syst. 2021 Oct 23;9(1):37. doi: 10.1007/s13755-021-00167-3. eCollection 2021 Dec.

Authors

Andrea Campagner¹, Anna Carobene², Federico Cabitza¹

Affiliations

¹ DISCo, Università degli Studi di Milano-Bicocca, Milan, Italy.
² Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Milan, Italy.

Abstract

Purpose: The rRT-PCR for COVID-19 diagnosis is affected by long turnaround time, potential shortage of reagents, high false-negative rates and high costs. Routine hematochemical tests are a faster and less expensive alternative for diagnosis. Thus, Machine Learning (ML) has been applied to hematological parameters to develop diagnostic tools and help clinicians in promptly managing positive patients. However, few ML models have been externally validated, making their real-world applicability unclear.

Methods: We externally validate 6 state-of-the-art diagnostic ML models, based on Complete Blood Count (CBC) and trained on a dataset encompassing 816 COVID-19 positive cases. The external validation was performed based on two datasets, collected at two different hospitals in northern Italy and encompassing 163 and 104 COVID-19 positive cases, in terms of both error rate and calibration.

Results and conclusion: We report an average AUC of 95% and average Brier score of 0.11, out-performing existing ML methods, and showing good cross-site transportability. The best performing model (SVM) reported an average AUC of 97.5% (Sensitivity: 87.5%, Specificity: 94%), comparable with the performance of RT-PCR, and was also the best calibrated. The validated models can be useful in the early identification of potential COVID-19 patients, due to the rapid availability of CBC exams, and in multiple test settings.

Keywords: COVID-19; Calibration; Complete Blood count; External validation; Machine Learning.