Exploring empirical rank-frequency distributions longitudinally through a simple stochastic process

PLoS One. 2014 Apr 22;9(4):e94920. doi: 10.1371/journal.pone.0094920. eCollection 2014.

Abstract

The frequent appearance of empirical rank-frequency laws, such as Zipf's law, in a wide range of domains reinforces the importance of understanding and modeling these laws and rank-frequency distributions in general. In this spirit, we utilize a simple stochastic cascade process to simulate several empirical rank-frequency distributions longitudinally. We focus especially on limiting the process's complexity to increase accessibility for non-experts in mathematics. The process provides a good fit for many empirical distributions because the stochastic multiplicative nature of the process leads to an often observed concave rank-frequency distribution (on a log-log scale) and the finiteness of the cascade replicates real-world finite size effects. Furthermore, we show that repeated trials of the process can roughly simulate the longitudinal variation of empirical ranks. However, we find that the empirical variation is often less that the average simulated process variation, likely due to longitudinal dependencies in the empirical datasets. Finally, we discuss the process limitations and practical applications.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Books
  • Commerce
  • Computer Simulation
  • Databases as Topic
  • Empirical Research*
  • France
  • Motion Pictures / economics
  • Statistical Distributions*
  • Stochastic Processes
  • United States

Grants and funding

The funding was derived from general department funding at Aalto University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.