Comparison of Real Frequencies of Strings vs. the Expected Ones Reveals the Information Capacity of Macromoleculae

Michael G Sadovsky

doi:10.1023/A:1022554613105

Comparison of Real Frequencies of Strings vs. the Expected Ones Reveals the Information Capacity of Macromoleculae

J Biol Phys. 2003 Mar;29(1):23-38. doi: 10.1023/A:1022554613105.

Author

Michael G Sadovsky¹

Affiliation

¹ Division of Russian Academy of Sciences, Institute of Biophysics of Siberian, Akademgorodok, Krasnoyarsk, 660036.

Abstract

The information capacity of nucleotide sequences is defined through the calculation of specific entropy of their frequency dictionary. The specificentropy of the frequency dictionary is calculated against the reconstructeddictionary; this latter bears the most probable continuations of the shorterstrings. This developed measure allows to distinguish the sequences both from the randons ones, and from those with high level of (rather simple) order. Some implications of the developed methodology in the fields of genetics,bioinformatics, and molecular biology are discussed.

Keywords: Markov model; dictionary; entropy; information capacity; ordered sequence; random sequence; reconstructed dictionary; specific entropy.