A DNA primary sequence is a string consisting of letters on an alphabet Ω = {a, c, g, t}. Based on all of the 2-combinations of the set Ω, here the repetition is allowed, we transform a DNA primary sequence into a special sequence over a set with cardinality 10. With the 10-letter sequence, we associate 10 nonnegative numerical sequences and then derive a 10-component vector by means of a weighted pseudo-entropy, which can reflect the information on elements of a sequence and, especially, the order relation among them. The new quantitative characterization of DNA sequences is sensitive to substitution of the string elements. The examination of the relationship among β-globin genes of 15 species illustrates the utility of the proposed approach.
Copyright © 2010 Wiley Periodicals, Inc.