Wide-coverage probabilistic sentence processing

J Psycholinguist Res. 2000 Nov;29(6):647-69. doi: 10.1023/a:1026560822390.

Abstract

This paper describes a fully implemented, broad-coverage model of human syntactic processing. The model uses probabilistic parsing techniques, which combine phrase structure, lexical category, and limited subcategory probabilities with an incremental, left-to-right "pruning" mechanism based on cascaded Markov models. The parameters of the system are established through a uniform training algorithm, which determines maximum-likelihood estimates from a parsed corpus. The probabilistic parsing mechanism enables the system to achieve good accuracy on typical, "garden-variety" language (i.e., when tested on corpora). Furthermore, the incremental probabilistic ranking of the preferred analyses during parsing also naturally explains observed human behavior for a range of garden-path structures. We do not make strong psychological claims about the specific probabilistic mechanism discussed here, which is limited by a number of practical considerations. Rather, we argue incremental probabilistic parsing models are, in general, extremely well suited to explaining this dual nature--generally good and occasionally pathological--of human linguistic performance.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cognition*
  • Humans
  • Language
  • Linguistics*
  • Markov Chains
  • Models, Statistical*
  • Psycholinguistics / statistics & numerical data
  • Speech Perception*
  • Vocabulary