Similarity analysis of DNA sequences based on the weighted pseudo-entropy

Chun Li; Hong Ma; Yang Zhou; Xiaolei Wang; Xiaoqi Zheng

doi:10.1002/jcc.21656

Similarity analysis of DNA sequences based on the weighted pseudo-entropy

J Comput Chem. 2011 Mar;32(4):675-80. doi: 10.1002/jcc.21656. Epub 2010 Oct 1.

Authors

Chun Li¹, Hong Ma, Yang Zhou, Xiaolei Wang, Xiaoqi Zheng

Affiliation

¹ Department of Mathematics, Bohai University, Jinzhou 121013, People's Republic of China. lchlmb@yahoo.com.cn

PMID: 20890910
DOI: 10.1002/jcc.21656

Abstract

A DNA primary sequence is a string consisting of letters on an alphabet Ω = {a, c, g, t}. Based on all of the 2-combinations of the set Ω, here the repetition is allowed, we transform a DNA primary sequence into a special sequence over a set with cardinality 10. With the 10-letter sequence, we associate 10 nonnegative numerical sequences and then derive a 10-component vector by means of a weighted pseudo-entropy, which can reflect the information on elements of a sequence and, especially, the order relation among them. The new quantitative characterization of DNA sequences is sensitive to substitution of the string elements. The examination of the relationship among β-globin genes of 15 species illustrates the utility of the proposed approach.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
Base Sequence
DNA / genetics*
Humans
Sensitivity and Specificity
Sequence Analysis, DNA / methods*
beta-Globins / genetics

Substances

beta-Globins
DNA