The acquisition of allophonic rules: statistical learning with linguistic constraints

Sharon Peperkamp; Rozenn Le Calvez; Jean-Pierre Nadal; Emmanuel Dupoux

doi:10.1016/j.cognition.2005.10.006

The acquisition of allophonic rules: statistical learning with linguistic constraints

Cognition. 2006 Oct;101(3):B31-41. doi: 10.1016/j.cognition.2005.10.006. Epub 2005 Dec 20.

Authors

Sharon Peperkamp¹, Rozenn Le Calvez, Jean-Pierre Nadal, Emmanuel Dupoux

Affiliation

¹ Laboratoire de Sciences Cognitives et Psycholinguistique, EHESS-ENS-CNRS, 46 Rue d'Ulm, 75005 Paris, France. Sharon.Peperkamp@ens.fr

PMID: 16364279
DOI: 10.1016/j.cognition.2005.10.006

Abstract

Phonological rules relate surface phonetic word forms to abstract underlying forms that are stored in the lexicon. Infants must thus acquire these rules in order to infer the abstract representation of words. We implement a statistical learning algorithm for the acquisition of one type of rule, namely allophony, which introduces context-sensitive phonetic variants of phonemes. This algorithm is based on the observation that different realizations of a single phoneme typically do not appear in the same contexts (ideally, they have complementary distributions). In particular, it measures the discrepancies in context probabilities for each pair of phonetic segments. In Experiment 1, we test the algorithm's performances on a pseudo-language and show that it is robust to statistical noise due to sampling and coding errors, and to non-systematic rule application. In Experiment 2, we show that a natural corpus of semiphonetically transcribed child-directed speech in French presents a very large number of near-complementary distributions that do not correspond to existing allophonic rules. These spurious allophonic rules can be eliminated by a linguistically motivated filtering mechanism based on a phonetic representation of segments. We discuss the role of a priori linguistic knowledge in the statistical learning of phonology.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Humans
Linguistics / methods
Linguistics / statistics & numerical data*
Models, Statistical
Phonetics*
Verbal Learning*