Large local analysis of the unaligned genome and its application

J Comput Biol. 2013 Jan;20(1):19-29. doi: 10.1089/cmb.2011.0052.

Abstract

We describe a novel method for the local analysis of complete genomes. A local distance measure called LODIST is proposed, which is based on the relationship between the longest common words and the shortest absent words of two genomes we compared. LODIST can perform better than local alignment when the local region is large enough to cover some recombination genes. A distance measure called SILD.k.t with resolution k and step t is derived by the integral LODISTs of whole genomes. It is shown that the algorithm for computing the LODISTs and SILD.k.t is linear, which is fast enough to consider the problem of the genome comparison. We verify this method by recognizing the subtypes of the HIV-1 complete genomes and genome segments.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Base Sequence
  • Computational Biology
  • DNA, Viral / genetics
  • Genome, Viral
  • Genomics / statistics & numerical data*
  • HIV-1 / classification
  • HIV-1 / genetics
  • Mathematical Concepts
  • Phylogeny
  • Recombination, Genetic
  • Sequence Alignment / statistics & numerical data

Substances

  • DNA, Viral