Detecting memory and structure in human navigation patterns using Markov chain models of varying order

PLoS One. 2014 Jul 11;9(7):e102070. doi: 10.1371/journal.pone.0102070. eCollection 2014.

Abstract

One of the most frequently used models for understanding human navigation on the Web is the Markov chain model, where Web pages are represented as states and hyperlinks as probabilities of navigating from one page to another. Predominantly, human navigation on the Web has been thought to satisfy the memoryless Markov property stating that the next page a user visits only depends on her current page and not on previously visited ones. This idea has found its way in numerous applications such as Google's PageRank algorithm and others. Recently, new studies suggested that human navigation may better be modeled using higher order Markov chain models, i.e., the next page depends on a longer history of past clicks. Yet, this finding is preliminary and does not account for the higher complexity of higher order Markov chain models which is why the memoryless model is still widely used. In this work we thoroughly present a diverse array of advanced inference methods for determining the appropriate Markov chain order. We highlight strengths and weaknesses of each method and apply them for investigating memory and structure of human navigation on the Web. Our experiments reveal that the complexity of higher order models grows faster than their utility, and thus we confirm that the memoryless model represents a quite practical model for human navigation on a page level. However, when we expand our analysis to a topical level, where we abstract away from specific page transitions to transitions between topics, we find that the memoryless assumption is violated and specific regularities can be observed. We report results from experiments with two types of navigational datasets (goal-oriented vs. free form) and observe interesting structural differences that make a strong argument for more contextual studies of human navigation in future work.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Humans
  • Markov Chains*
  • Memory / physiology*
  • Models, Theoretical

Grants and funding

This research was in part funded by the DFG German Science Fund research project "Pragmatics and Semantics in Social Tagging Systems II" (STR 1191/3-2) as well as the FWF Austrian Science Fund research project "Navigability of Decentralized Information Networks" (P24866). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.