Wastewater-Based Epidemiology to Describe the Evolution of SARS-CoV-2 in the South-East of Spain, and Application of Phylogenetic Analysis and a Machine Learning Approach

Viruses. 2023 Jul 3;15(7):1499. doi: 10.3390/v15071499.

Abstract

The COVID-19 pandemic has posed a significant global threat, leading to several initiatives for its control and management. One such initiative involves wastewater-based epidemiology, which has gained attention for its potential to provide early warning of virus outbreaks and real-time information on its spread. In this study, wastewater samples from two wastewater treatment plants (WWTPs) located in the southeast of Spain (region of Murcia), namely Murcia, and Cartagena, were analyzed using RT-qPCR and high-throughput sequencing techniques to describe the evolution of SARS-CoV-2 in the South-East of Spain. Additionally, phylogenetic analysis and machine learning approaches were applied to develop a pre-screening tool for the identification of differences among the variant composition of different wastewater samples. The results confirmed that the levels of SARS-CoV-2 in these wastewater samples changed concerning the number of SARS-CoV-2 cases detected in the population, and variant occurrences were in line with clinical reported data. The sequence analyses helped to describe how the different SARS-CoV-2 variants have been replaced over time. Additionally, the phylogenetic analysis showed that samples obtained at close sampling times exhibited a higher similarity than those obtained more distantly in time. A second analysis using a machine learning approach based on the mutations found in the SARS-CoV-2 spike protein was also conducted. Hierarchical clustering (HC) was used as an efficient unsupervised approach for data analysis. Results indicated that samples obtained in October 2022 in Murcia and Cartagena were significantly different, which corresponded well with the different virus variants circulating in the two locations. The proposed methods in this study are adequate for comparing consensus sequence types of the SARS-CoV-2 sequences as a preliminary evaluation of potential changes in the variants that are circulating in a given population at a specific time point.

Keywords: SARS-CoV-2; epidemiology; machine learning approach; molecular virology; phylogenetic analysis; wastewater-based epidemiology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19* / epidemiology
  • Humans
  • Machine Learning
  • Pandemics
  • Phylogeny
  • SARS-CoV-2* / genetics
  • Spain / epidemiology
  • Wastewater
  • Wastewater-Based Epidemiological Monitoring

Substances

  • spike protein, SARS-CoV-2
  • Wastewater

Supplementary concepts

  • SARS-CoV-2 variants

Grants and funding

This research was supported by the European Commission NextGenerationEU fund, through CSIC’s Global Health Platform (PTI Salud Global CSIC), the VATar COVID 19, the COVI+D Program Region of Murcia (Fundación Séneca) and ESAMUR. IATA-CSIC is a Centre of Excellence Severo Ochoa (CEX2021-001189-S MCIN/AEI/10.13039/501100011033). EC-F is recipient of a postdoctoral contract from the MICINN Call 2018 (PRE2018-083753). PT is holding a Ramón y Cajal contract from the Ministerio de Ciencia e Innovación.