Viral genome phylogeny based on Lempel-Ziv complexity and Hausdorff distance

J Theor Biol. 2014 May 7:348:12-20. doi: 10.1016/j.jtbi.2014.01.022. Epub 2014 Jan 29.

Abstract

In this paper, we develop a novel method to study the viral genome phylogeny. We apply Lempel-Ziv complexity to define the distance between two nucleic acid sequences. Then, based on this distance we use the Hausdorff distance (HD) and a modified Hausdorff distance (MHD) to make the phylogenetic analysis for multi-segmented viral genomes. The results show the MHD can provide more accurate phylogenetic relationship. Our method can have global comparison of all multi-segmented genomes simultaneously, that is, we treat the multi-segmented viral genome as an entirety to make the comparative analysis. Our method is not affected by the number or order of segments, and each segment can make contribution for the phylogeny of whole genomes. We have analyzed several groups of real multi-segmented genomes from different viral families. The results show that our method will provide a new powerful tool for studying the classification of viral genomes and their phylogenetic relationships.

Keywords: Global comparison; Multi-segmented; Single-segmented; Virus classification.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Base Sequence
  • DNA, Viral / genetics
  • Databases, Nucleic Acid
  • Genome, Viral*
  • HIV-1 / classification
  • HIV-1 / genetics
  • Phylogeny
  • Sequence Analysis, DNA / methods*
  • Simian Immunodeficiency Virus / classification
  • Simian Immunodeficiency Virus / genetics

Substances

  • DNA, Viral