Similarity analysis of DNA sequences based on the weighted pseudo-entropy

J Comput Chem. 2011 Mar;32(4):675-80. doi: 10.1002/jcc.21656. Epub 2010 Oct 1.

Abstract

A DNA primary sequence is a string consisting of letters on an alphabet Ω = {a, c, g, t}. Based on all of the 2-combinations of the set Ω, here the repetition is allowed, we transform a DNA primary sequence into a special sequence over a set with cardinality 10. With the 10-letter sequence, we associate 10 nonnegative numerical sequences and then derive a 10-component vector by means of a weighted pseudo-entropy, which can reflect the information on elements of a sequence and, especially, the order relation among them. The new quantitative characterization of DNA sequences is sensitive to substitution of the string elements. The examination of the relationship among β-globin genes of 15 species illustrates the utility of the proposed approach.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Base Sequence
  • DNA / genetics*
  • Humans
  • Sensitivity and Specificity
  • Sequence Analysis, DNA / methods*
  • beta-Globins / genetics

Substances

  • beta-Globins
  • DNA