Consistent comparison of symptom-based methods for COVID-19 infection detection

Jesús Rufino; Juan Marcos Ramírez; Jose Aguilar; Carlos Baquero; Jaya Champati; Davide Frey; Rosa Elvira Lillo; Antonio Fernández-Anta

doi:10.1016/j.ijmedinf.2023.105133

Consistent comparison of symptom-based methods for COVID-19 infection detection

Int J Med Inform. 2023 Sep:177:105133. doi: 10.1016/j.ijmedinf.2023.105133. Epub 2023 Jun 29.

Authors

Jesús Rufino¹, Juan Marcos Ramírez², Jose Aguilar³, Carlos Baquero⁴, Jaya Champati¹, Davide Frey⁵, Rosa Elvira Lillo⁶, Antonio Fernández-Anta¹

Affiliations

¹ IMDEA Networks Institute, 28918, Madrid, Spain.
² IMDEA Networks Institute, 28918, Madrid, Spain. Electronic address: juan.ramirez@imdea.org.
³ IMDEA Networks Institute, 28918, Madrid, Spain; CEMISID, Universidad de Los Andes, Mérida, 5101, Venezuela; CIDITIC, Universidad EAFIT, Medellín, Colombia.
⁴ Universidade do Minho and INESC TEC, Braga Portugal.
⁵ Inria Rennes, Rennes, France.
⁶ Universidad Carlos III, Madrid, Spain.

PMID: 37393765
DOI: 10.1016/j.ijmedinf.2023.105133

Abstract

Background: During the global pandemic crisis, various detection methods of COVID-19-positive cases based on self-reported information were introduced to provide quick diagnosis tools for effectively planning and managing healthcare resources. These methods typically identify positive cases based on a particular combination of symptoms, and they have been evaluated using different datasets.

Purpose: This paper presents a comprehensive comparison of various COVID-19 detection methods based on self-reported information using the University of Maryland Global COVID-19 Trends and Impact Survey (UMD-CTIS), a large health surveillance platform, which was launched in partnership with Facebook.

Methods: Detection methods were implemented to identify COVID-19-positive cases among UMD-CTIS participants reporting at least one symptom and a recent antigen test result (positive or negative) for six countries and two periods. Multiple detection methods were implemented for three different categories: rule-based approaches, logistic regression techniques, and tree-based machine-learning models. These methods were evaluated using different metrics including F1-score, sensitivity, specificity, and precision. An explainability analysis has also been conducted to compare methods.

Results: Fifteen methods were evaluated for six countries and two periods. We identify the best method for each category: rule-based methods (F1-score: 51.48% - 71.11%), logistic regression techniques (F1-score: 39.91% - 71.13%), and tree-based machine learning models (F1-score: 45.07% - 73.72%). According to the explainability analysis, the relevance of the reported symptoms in COVID-19 detection varies between countries and years. However, there are two variables consistently relevant across approaches: stuffy or runny nose, and aches or muscle pain.

Conclusions: Regarding the categories of detection methods, evaluating detection methods using homogeneous data across countries and years provides a solid and consistent comparison. An explainability analysis of a tree-based machine-learning model can assist in identifying infected individuals specifically based on their relevant symptoms. This study is limited by the self-report nature of data, which cannot replace clinical diagnosis.

Keywords: COVID-19 detection methods; Explainability analysis; F1-score; Logistic regression methods; Rule-based methods; Tree-based models.

Publication types

Review
Research Support, Non-U.S. Gov't

MeSH terms

COVID-19* / diagnosis
Humans
Machine Learning
Self Report