Inferring the Main Drivers of SARS-CoV-2 Global Transmissibility by Feature Selection Methods

Geohealth. 2021 Sep 1;5(9):e2021GH000432. doi: 10.1029/2021GH000432. eCollection 2021 Sep.

Abstract

Identifying the main environmental drivers of SARS-CoV-2 transmissibility in the population is crucial for understanding current and potential future outbursts of COVID-19 and other infectious diseases. To address this problem, we concentrate on the basic reproduction number R 0, which is not sensitive to testing coverage and represents transmissibility in an absence of social distancing and in a completely susceptible population. While many variables may potentially influence R 0, a high correlation between these variables may obscure the result interpretation. Consequently, we combine Principal Component Analysis with feature selection methods from several regression-based approaches to identify the main demographic and meteorological drivers behind R 0. We robustly obtain that country's wealth/development (GDP per capita or Human Development Index) is the most important R 0 predictor at the global level, probably being a good proxy for the overall contact frequency in a population. This main effect is modulated by built-up area per capita (crowdedness in indoor space), onset of infection (likely related to increased awareness of infection risks), net migration, unhealthy living lifestyle/conditions including pollution, seasonality, and possibly BCG vaccination prevalence. Also, we argue that several variables that significantly correlate with transmissibility do not directly influence R 0 or affect it differently than suggested by naïve analysis.

Keywords: COVID‐19 environmental dependence; basic reproduction number; disease spread risk factors; feature selection; principal component analysis; regression analysis.