The dimensionality of discourse

Proc Natl Acad Sci U S A. 2010 Mar 16;107(11):4866-71. doi: 10.1073/pnas.0908315107. Epub 2010 Mar 1.

Abstract

The paragraph spaces of five text corpora, of different genres and intended audiences, in four different languages, all show the same two-scale structure, with the dimension at short distances being lower than at long distances. In all five cases the short-distance dimension is approximately eight. Control simulations with randomly permuted word instances do not exhibit a low dimensional structure. The observed topology places important constraints on the way in which authors construct prose, which may be universal.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Language
  • Models, Theoretical
  • Semantics
  • Textbooks as Topic
  • Writing*