Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests

Clin Chem Lab Med. 2020 Oct 21;59(2):421-431. doi: 10.1515/cclm-2020-1294.

Abstract

Objectives: The rRT-PCR test, the current gold standard for the detection of coronavirus disease (COVID-19), presents with known shortcomings, such as long turnaround time, potential shortage of reagents, false-negative rates around 15-20%, and expensive equipment. The hematochemical values of routine blood exams could represent a faster and less expensive alternative.

Methods: Three different training data set of hematochemical values from 1,624 patients (52% COVID-19 positive), admitted at San Raphael Hospital (OSR) from February to May 2020, were used for developing machine learning (ML) models: the complete OSR dataset (72 features: complete blood count (CBC), biochemical, coagulation, hemogasanalysis and CO-Oxymetry values, age, sex and specific symptoms at triage) and two sub-datasets (COVID-specific and CBC dataset, 32 and 21 features respectively). 58 cases (50% COVID-19 positive) from another hospital, and 54 negative patients collected in 2018 at OSR, were used for internal-external and external validation.

Results: We developed five ML models: for the complete OSR dataset, the area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.83 to 0.90; for the COVID-specific dataset from 0.83 to 0.87; and for the CBC dataset from 0.74 to 0.86. The validations also achieved good results: respectively, AUC from 0.75 to 0.78; and specificity from 0.92 to 0.96.

Conclusions: ML can be applied to blood tests as both an adjunct and alternative method to rRT-PCR for the fast and cost-effective identification of COVID-19-positive patients. This is especially useful in developing countries, or in countries facing an increase in contagions.

Keywords: COVID-19; SARS-CoV-2; blood laboratory tests; complete blood count; gradient boosted decision tree; machine learning.

MeSH terms

  • Algorithms
  • Area Under Curve
  • Blood Cell Count
  • Blood Chemical Analysis / methods*
  • COVID-19 / blood*
  • COVID-19 Testing / methods*
  • Datasets as Topic
  • Hematologic Tests / methods*
  • Humans
  • Machine Learning*
  • SARS-CoV-2
  • Sensitivity and Specificity