The entropy rate of Linear Additive Markov Processes

PLoS One. 2024 Apr 5;19(4):e0295074. doi: 10.1371/journal.pone.0295074. eCollection 2024.

Abstract

This work derives a theoretical value for the entropy of a Linear Additive Markov Process (LAMP), an expressive but simple model able to generate sequences with a given autocorrelation structure. Our research establishes that the theoretical entropy rate of a LAMP model is equivalent to the theoretical entropy rate of the underlying first-order Markov Chain. The LAMP model captures complex relationships and long-range dependencies in data with similar expressibility to a higher-order Markov process. While a higher-order Markov process has a polynomial parameter space, a LAMP model is characterised only by a probability distribution and the transition matrix of an underlying first-order Markov Chain. This surprising result can be explained by the information balance between the additional structure imposed by the next state distribution of the LAMP model, and the additional randomness of each new transition. Understanding the entropy of the LAMP model provides a tool to model complex dependencies in data while retaining useful theoretical results. To emphasise the practical applications, we use the LAMP model to estimate the entropy rate of the LastFM, BrightKite, Wikispeedia and Reuters-21578 datasets. We compare estimates calculated using frequency probability estimates, a first-order Markov model and the LAMP model, also considering two approaches to ensure the transition matrix is irreducible. In most cases the LAMP entropy rates are lower than those of the alternatives, suggesting that LAMP model is better at accommodating structural dependencies in the processes, achieving a more accurate estimate of the true entropy.

MeSH terms

  • Algorithms*
  • Entropy
  • Linear Models
  • Markov Chains
  • Probability

Grants and funding

B. Smart would like to acknowledge the support of a Westpac Future Leaders Scholarship. M. Roughan and L. Mitchell are supported by the Australian Government through the Australian Research Council’s Discovery Projects funding scheme (project DP210103700). There was no additional external funding received for this study.