Entropy and complexity of finite sequences as fluctuating quantities

Biosystems. 2002 Jan;64(1-3):23-32. doi: 10.1016/s0303-2647(01)00171-x.

Abstract

The paper is devoted to the analysis of digitized sequences of real numbers and discrete strings, by means of the concepts of entropy and complexity. Special attention is paid to the random character of these quantities and their fluctuation spectrum. As applications, we discuss neural spike-trains and DNA sequences. We consider a given sequence as one realization of finite length of certain random process. The other members of the ensemble are defined by appropriate surrogate sequences and surrogate processes. We show that n-gram entropies and the context-free grammatical complexity have to be considered as fluctuating quantities and study the corresponding distributions. Different complexity measures reveal different aspects of a sequence. Finally, we show that the diversity of the entropy (that takes small values for pseudorandom strings) and the context-free grammatical complexity (which takes large values for pseudorandom strings) give, nonetheless, consistent results by comparison of the ranking of sample sequences taken from molecular biology, neuroscience, and artificial control sequences.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Action Potentials
  • Algorithms*
  • Data Interpretation, Statistical
  • Entropy*
  • Information Theory
  • Models, Neurological
  • Sequence Analysis, DNA / statistics & numerical data