New powerful statistics for alignment-free sequence comparison under a pattern transfer model

J Theor Biol. 2011 Sep 7;284(1):106-16. doi: 10.1016/j.jtbi.2011.06.020. Epub 2011 Jun 25.

Abstract

Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D*2 and D(s)2 showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D*2 and D(s)2 by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Data Interpretation, Statistical
  • Drosophila / genetics
  • Evolution, Molecular
  • HIV-1 / genetics
  • Models, Statistical
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid