Optimizing the Illumina COVIDSeq laboratorial and bioinformatics pipeline on thousands of samples for SARS-CoV-2 Variants of Concern tracking

Comput Struct Biotechnol J. 2022:20:2558-2563. doi: 10.1016/j.csbj.2022.05.033. Epub 2022 May 19.

Abstract

The SARS-CoV-2 Variants of Concern tracking via Whole Genome Sequencing represents a pillar of public health measures for the containment of the pandemic. The ability to track down the lineage distribution on a local and global scale leads to a better understanding of immune escape and to adopting interventions to contain novel outbreaks. This scenario poses a challenge for NGS laboratories worldwide that are pressed to have both a faster turnaround time and a high-throughput processing of swabs for sequencing and analysis. In this study, we present an optimization of the Illumina COVID-seq protocol carried out on thousands of SARS-CoV-2 samples at the wet and dry level. We discuss the unique challenges related to processing hundreds of swabs per week such as the tradeoff between ultra-high sensitivity and negative contamination levels, cost efficiency and bioinformatics quality metrics.

Keywords: BAM, Binary Alignment Map; BED, Browser Extensible Data; Bioinformatics workflow; COVID mutations; FDA, Food and Drug Administration; HPC, High Performance Computing; Illumina COVID-seq; LIMS, Laboratory Information Management System; NGS, Next Generation Sequencing; Oncology; Oncology Metagenomics; RBD, Receptor-Binding Domain; SARS-CoV-2 Variants of Concern; SARS-CoV-2 genome; SARS-CoV-2 mutation; SARS-CoV-2, Severe Acute Respiratory Syndrome Coronavirus; TAT, Turnaround Time; VoC, Variants of Concern.