Genomic signatures in viral sequences by in-frame and out-frame mutual information

J Theor Biol. 2016 Aug 21:403:1-9. doi: 10.1016/j.jtbi.2016.05.014. Epub 2016 May 10.

Abstract

In order to understand the unique biology of viruses, we use the Mutual Information Function (MIF) to characterize 792 viral sequences comprising 458 viral whole genomes. A 3-base periodicity (3-bp) was observed only in DNA-viruses whereas RNA-viruses showed irregular patterns. The correlation of MIF values at frequencies of 3-bp (in-frame) with frequencies of 4 and 5bps (out-frame), turned out to be useful to distinguish viruses according to their respective taxonomic order, and whether they pertain to any of the three different kingdoms, Eubacteria, Archaea and Eukarya. The clustering of viruses was carried out by the use of a new statistics, namely, the pair of in- and out-frame values of the MIF. The clustering thus obtained turned out to be entirely consistent with the current viral taxonomy. As a result we were able to compare in a single plot both viral and cellular genomes unlike any given phylogenetic reconstruction.

Keywords: Mutual Information function; Viral space; Virus taxonomy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Base Sequence
  • DNA, Viral / genetics
  • Genome, Archaeal / genetics
  • Genome, Bacterial / genetics
  • Genome, Viral / genetics*
  • Viruses / genetics*

Substances

  • DNA, Viral