Zipf's Law Arises Naturally When There Are Underlying, Unobserved Variables

Laurence Aitchison; Nicola Corradi; Peter E Latham

doi:10.1371/journal.pcbi.1005110

Zipf's Law Arises Naturally When There Are Underlying, Unobserved Variables

PLoS Comput Biol. 2016 Dec 20;12(12):e1005110. doi: 10.1371/journal.pcbi.1005110. eCollection 2016 Dec.

Authors

Laurence Aitchison¹, Nicola Corradi², Peter E Latham¹

Affiliations

¹ Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom.
² Weill Medical College, Cornell University, New York, New York, United States of America.

Abstract

Zipf's law, which states that the probability of an observation is inversely proportional to its rank, has been observed in many domains. While there are models that explain Zipf's law in each of them, those explanations are typically domain specific. Recently, methods from statistical physics were used to show that a fairly broad class of models does provide a general explanation of Zipf's law. This explanation rests on the observation that real world data is often generated from underlying causes, known as latent variables. Those latent variables mix together multiple models that do not obey Zipf's law, giving a model that does. Here we extend that work both theoretically and empirically. Theoretically, we provide a far simpler and more intuitive explanation of Zipf's law, which at the same time considerably extends the class of models to which this explanation can apply. Furthermore, we also give methods for verifying whether this explanation applies to a particular dataset. Empirically, these advances allowed us extend this explanation to important classes of data, including word frequencies (the first domain in which Zipf's law was discovered), data with variable sequence length, and multi-neuron spiking activity.

Publication types

Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Action Potentials
Databases, Factual
Entropy
Language
Models, Neurological
Models, Theoretical*

Grants and funding

R01 EY012978/EY/NEI NIH HHS/United States