Comparison of Real Frequencies of Strings vs. the Expected Ones Reveals the Information Capacity of Macromoleculae

J Biol Phys. 2003 Mar;29(1):23-38. doi: 10.1023/A:1022554613105.

Abstract

The information capacity of nucleotide sequences is defined through the calculation of specific entropy of their frequency dictionary. The specificentropy of the frequency dictionary is calculated against the reconstructeddictionary; this latter bears the most probable continuations of the shorterstrings. This developed measure allows to distinguish the sequences both from the randons ones, and from those with high level of (rather simple) order. Some implications of the developed methodology in the fields of genetics,bioinformatics, and molecular biology are discussed.

Keywords: Markov model; dictionary; entropy; information capacity; ordered sequence; random sequence; reconstructed dictionary; specific entropy.