Bioinformatic Approaches for Comparative Analysis of Viruses

Methods Mol Biol. 2018:1704:401-417. doi: 10.1007/978-1-4939-7463-4_15.

Abstract

The field of viral genomic studies has experienced an unprecedented increase in data volume. New strains of known viruses are constantly being added to the GenBank database and so are completely new species with little or no resemblance to our databases of sequences. In addition to this, metagenomic techniques have the potential to further increase the number and rate of sequenced genomes. Besides, it is important to consider that viruses have a set of unique features that often break down molecular biology dogmas, e.g., the flux of information from RNA to DNA in retroviruses and the use of RNA molecules as genomes. As a result, extracting meaningful information from viral genomes remains a challenge and standard methods for comparing the unknown and our databases of characterized sequences may need to be modified. Thus, several bioinformatic approaches and tools have been created to address the challenge of analyzing viral data. In this chapter, we offer descriptions and protocols of some of the most important bioinformatic techniques for comparative analysis of viruses. We also provide comments and discussion on how viruses' unique features can affect standard analyses and how to overcome some of the major sources of problems. Topics include: (1) Clustering of related genomes, (2) Whole genome multiple sequence alignments for small RNA viruses, (3) Protein alignments for marker genes, (4) Analyses based on ortholog groups, and (5) Taxonomic identification and comparisons of viruses from environmental datasets.

Keywords: BLAST; Bioinformatics; Comparative analysis; Genomics; Metagenomics; Multiple sequence alignment; Ortholog groups; VOCs; Viral genomes; Viromes; Virus.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Computational Biology / methods*
  • Genome, Viral*
  • Metagenomics*
  • Phylogeny
  • Sequence Homology
  • Software
  • Viruses / classification
  • Viruses / genetics*