ViTAL: Vision TrAnsformer based Low coverage SARS-CoV-2 lineage assignment

Bioinformatics. 2024 Mar 4;40(3):btae093. doi: 10.1093/bioinformatics/btae093.

Abstract

Motivation: Rapid spread of viral diseases such as Coronavirus disease 2019 (COVID-19) highlights an urgent need for efficient surveillance of virus mutation and transmission dynamics, which requires fast, inexpensive and accurate viral lineage assignment. The first two goals might be achieved through low-coverage whole-genome sequencing (LC-WGS) which enables rapid genome sequencing at scale and at reduced costs. Unfortunately, LC-WGS significantly diminishes the genomic details, rendering accurate lineage assignment very challenging.

Results: We present ViTAL, a novel deep learning algorithm specifically designed to perform lineage assignment of low coverage-sequenced genomes. ViTAL utilizes a combination of MinHash for genomic feature extraction and Vision Transformer for fine-grain genome classification and lineage assignment. We show that ViTAL outperforms state-of-the-art tools across diverse coverage levels, reaching up to 87.7% lineage assignment accuracy at 1× coverage where state-of-the-art tools such as UShER and Kraken2 achieve the accuracy of 5.4% and 27.4% respectively. ViTAL achieves comparable accuracy results with up to 8× lower coverage than state-of-the-art tools. We explore ViTAL's ability to identify the lineages of novel genomes, i.e. genomes the Vision Transformer was not trained on. We show how ViTAL can be applied to preliminary phylogenetic placement of novel variants.

Availability and implementation: The data underlying this article are available in https://github.com/zuherJahshan/vital and can be accessed with 10.5281/zenodo.10688110.

MeSH terms

  • Algorithms
  • COVID-19*
  • Chromosome Mapping
  • Humans
  • Phylogeny
  • SARS-CoV-2* / genetics

Grants and funding