Identifying Country-Level Risk Factors for the Spread of COVID-19 in Europe Using Machine Learning

Viruses. 2022 Mar 17;14(3):625. doi: 10.3390/v14030625.

Abstract

Coronavirus disease 2019 (COVID-19) has resulted in approximately 5 million deaths around the world with unprecedented consequences in people's daily routines and in the global economy. Despite vast increases in time and money spent on COVID-19-related research, there is still limited information about the factors at the country level that affected COVID-19 transmission and fatality in EU. The paper focuses on the identification of these risk factors using a machine learning (ML) predictive pipeline and an associated explainability analysis. To achieve this, a hybrid dataset was created employing publicly available sources comprising heterogeneous parameters from the majority of EU countries, e.g., mobility measures, policy responses, vaccinations, and demographics/generic country-level parameters. Data pre-processing and data exploration techniques were initially applied to normalize the available data and decrease the feature dimensionality of the data problem considered. Then, a linear ε-Support Vector Machine (ε-SVM) model was employed to implement the regression task of predicting the number of deaths for each one of the three first pandemic waves (with mean square error of 0.027 for wave 1 and less than 0.02 for waves 2 and 3). Post hoc explainability analysis was finally applied to uncover the rationale behind the decision-making mechanisms of the ML pipeline and thus enhance our understanding with respect to the contribution of the selected country-level parameters to the prediction of COVID-19 deaths in EU.

Keywords: COVID-19; data mining; explainability; machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19* / epidemiology
  • Europe / epidemiology
  • Humans
  • Machine Learning
  • Risk Factors
  • Support Vector Machine