In Search of Salience: Focus Detection in the Speech of Different Talkers

Lang Speech. 2022 Sep;65(3):650-680. doi: 10.1177/00238309211046029. Epub 2021 Nov 28.

Abstract

Many different prosodic cues can help listeners predict upcoming speech. However, no research to date has assessed listeners' processing of preceding prosody from different speakers. The present experiments examine (1) whether individual speakers (of the same language variety) are likely to vary in their production of preceding prosody; (2) to the extent that there is talker variability, whether listeners are flexible enough to use any prosodic cues signaled by the individual speaker; and (3) whether types of prosodic cues (e.g., F0 versus duration) vary in informativeness. Using a phoneme-detection task, we examined whether listeners can entrain to different combinations of preceding prosodic cues to predict where focus will fall in an utterance. We used unsynthesized sentences recorded by four female native speakers of Australian English who happened to have used different preceding cues to produce sentences with prosodic focus: a combination of pre-focus overall duration cues, F0 and intensity (mean, maximum, range), and longer pre-target interval before the focused word onset (Speaker 1), only mean F0 cues, mean and maximum intensity, and longer pre-target interval (Speaker 2), only pre-target interval duration (Speaker 3), and only pre-focus overall duration and maximum intensity (Speaker 4). Listeners could entrain to almost every speaker's cues (the exception being Speaker 4's use of only pre-focus overall duration and maximum intensity), and could use whatever cues were available even when one of the cue sources was rendered uninformative. Our findings demonstrate both speaker variability and listener flexibility in the processing of prosodic focus.

Keywords: Prosody; cue weighting; focus; prosodic entrainment; talker variability.

MeSH terms

  • Australia
  • Cues
  • Female
  • Humans
  • Speech Acoustics
  • Speech Perception*
  • Speech*