Virus finding tools: current solutions and limitations

Brief Bioinform. 2022 Jul 18;23(4):bbac235. doi: 10.1093/bib/bbac235.

Abstract

Motivation: The study of the Human Virome remains challenging nowadays. Viral metagenomics, through high-throughput sequencing data, is the best choice for virus discovery. The metagenomics approach is culture-independent and sequence-independent, helping search for either known or novel viruses. Though it is estimated that more than 40% of the viruses found in metagenomics analysis are not recognizable, we decided to analyze several tools to identify and discover viruses in RNA-seq samples.

Results: We have analyzed eight Virus Tools for the identification of viruses in RNA-seq data. These tools were compared using a synthetic dataset of 30 viruses and a real one. Our analysis shows that no tool succeeds in recognizing all the viruses in the datasets. So we can conclude that each of these tools has pros and cons, and their choice depends on the application domain.

Availability: Synthetic data used through the review and raw results of their analysis can be found at https://zenodo.org/record/6426147. FASTQ files of real data can be found in GEO (https://www.ncbi.nlm.nih.gov/gds) or ENA (https://www.ebi.ac.uk/ena/browser/home). Raw results of their analysis can be downloaded from https://zenodo.org/record/6425917.

Keywords: Benchmarking; Metagenomics; RNA-seq; Virus Detection.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Metagenomics
  • Viruses* / genetics