The statistical signature of morphosyntax: a study of Hungarian and Italian infant-directed speech

Cognition. 2012 Nov;125(2):263-87. doi: 10.1016/j.cognition.2012.06.010. Epub 2012 Aug 6.

Abstract

Does statistical learning (Saffran, Aslin, & Newport, 1996) offer a universal segmentation strategy for young language learners? Previous studies on large corpora of English and structurally similar languages have shown that statistical segmentation can be an effective strategy. However, many of the world's languages have richer morphological systems, with sometimes several affixes attached to a stem (e.g. Hungarian: iskoláinkban: iskolá-i-nk-ban school.pl.poss1pl.inessive 'in our schools'). In these languages, word boundaries and morpheme boundaries do not coincide. Does the internal structure of words affect segmentation? What word forms does segmentation yield in morphologically rich languages: complex word forms or separate stems and affixes? The present paper answers these questions by exploring different segmentation algorithms in infant-directed speech corpora from two typologically and structurally different languages, Hungarian and Italian. The results suggest that the morphological and syntactic type of a language has an impact on statistical segmentation, with different strategies working best in different languages. Specifically, the direction of segmentation seems to be sensitive to the affixation order of a language. Thus, backward probabilities are more effective in Hungarian, a heavily suffixing language, whereas forward probabilities are more informative in Italian, which has fewer suffixes and a large number of phrase-initial function words. The consequences of these findings for potential segmentation and word learning strategies are discussed.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Humans
  • Hungary
  • Infant
  • Italy
  • Language Development
  • Language*
  • Linguistics*
  • Speech*