Evaluating Time Influence over Performance of Machine-Learning-Based Diagnosis: A Case Study of COVID-19 Pandemic in Brazil

Julliana Gonçalves Marques; Luiz Affonso Guedes; Márjory Cristiany da Costa Abreu

doi:10.3390/ijerph20010136

Evaluating Time Influence over Performance of Machine-Learning-Based Diagnosis: A Case Study of COVID-19 Pandemic in Brazil

Int J Environ Res Public Health. 2022 Dec 22;20(1):136. doi: 10.3390/ijerph20010136.

Authors

Julliana Gonçalves Marques¹, Luiz Affonso Guedes², Márjory Cristiany da Costa Abreu³

Affiliations

¹ Department of Informatics and Applied Mathematics, Federal University of Rio Grande do Norte, Natal 59078-970, Brazil.
² Department of Computer Engineering and Automation, Federal University of Rio Grande do Norte, Natal 59078-970, Brazil.
³ Department of Computing, Sheffield Hallam University, Sheffield S9 3TY, UK.

Abstract

Efficiently recognising severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) symptoms enables a quick and accurate diagnosis to be made, and helps in mitigating the spread of the coronavirus disease 2019. However, the emergence of new variants has caused constant changes in the symptoms associate with COVID-19. These constant changes directly impact the performance of machine-learning-based diagnose. In this context, considering the impact of these changes in symptoms over time is necessary for accurate diagnoses. Thus, in this study, we propose a machine-learning-based approach for diagnosing COVID-19 that considers the importance of time in model predictions. Our approach analyses the performance of XGBoost using two different time-based strategies for model training: month-to-month and accumulated strategies. The model was evaluated using known metrics: accuracy, precision, and recall. Furthermore, to explain the impact of feature changes on model prediction, feature importance was measured using the SHAP technique, an XAI technique. We obtained very interesting results: considering time when creating a COVID-19 diagnostic prediction model is advantageous.

Keywords: COVID-19; diagnosis; eXplainable AI; feature importance; machine learning.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Brazil / epidemiology
COVID-19* / diagnosis
COVID-19* / epidemiology
Humans
Machine Learning
Pandemics
SARS-CoV-2

Grants and funding

This research was funded by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior grant number 001.