VirusTaxo: Taxonomic classification of viruses from the genome sequence using k-mer enrichment

Genomics. 2022 Jul;114(4):110414. doi: 10.1016/j.ygeno.2022.110414. Epub 2022 Jun 17.

Abstract

Classification of viruses into their taxonomic ranks (e.g., order, family, and genus) provides a framework to organize an abundant population of viruses. Next-generation metagenomic sequencing technologies lead to a rapid increase in generating sequencing data of viruses which require bioinformatics tools to analyze the taxonomy. Many metagenomic taxonomy classifiers have been developed to study microbiomes, but it is particularly challenging to assign the taxonomy of diverse virus sequences and there is a growing need for dedicated methods to be developed that are optimized to classify virus sequences into their taxa. For taxonomic classification of viruses from metagenomic sequences, we developed VirusTaxo using diverse (e.g., 402 DNA and 280 RNA) genera of viruses. VirusTaxo has an average accuracy of 93% at genus level prediction in DNA and RNA viruses. VirusTaxo outperformed existing taxonomic classifiers of viruses where it assigned taxonomy of a larger fraction of metagenomic contigs compared to other methods. Benchmarking of VirusTaxo on a collection of SARS-CoV-2 sequencing libraries and metavirome datasets suggests that VirusTaxo can characterize virus taxonomy from highly diverse contigs and provide a reliable decision on the taxonomy of viruses.

Keywords: Genome; Hierarchical classification; Taxonomy; Virus; k-mer.

MeSH terms

  • COVID-19*
  • Humans
  • Metagenome
  • Metagenomics / methods
  • Phylogeny
  • SARS-CoV-2 / genetics
  • Viruses* / genetics