Sequential stream segregation of voiced and unvoiced speech sounds based on fundamental frequency

Marion David; Mathieu Lavandier; Nicolas Grimault; Andrew J Oxenham

doi:10.1016/j.heares.2016.11.016

Sequential stream segregation of voiced and unvoiced speech sounds based on fundamental frequency

Hear Res. 2017 Feb:344:235-243. doi: 10.1016/j.heares.2016.11.016. Epub 2016 Dec 5.

Authors

Marion David¹, Mathieu Lavandier², Nicolas Grimault³, Andrew J Oxenham⁴

Affiliations

¹ Department of Psychology, University of Minnesota, Minneapolis, MN, 55455, USA. Electronic address: david602@umn.edu.
² Univ. Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue M. Audin, F-69518, Vaulx-en-Velin Cedex, France.
³ Cognition Auditive et Psychoacoustique, Centre de Recherche en Neurosciences de Lyon, Université Lyon 1, UMR CRNS 5292, Avenue Tony Garnier, 69366, Lyon Cedex 07, France.
⁴ Department of Psychology, University of Minnesota, Minneapolis, MN, 55455, USA.

Abstract

Differences in fundamental frequency (F0) between voiced sounds are known to be a strong cue for stream segregation. However, speech consists of both voiced and unvoiced sounds, and less is known about whether and how the unvoiced portions are segregated. This study measured listeners' ability to integrate or segregate sequences of consonant-vowel tokens, comprising a voiceless fricative and a vowel, as a function of the F0 difference between interleaved sequences of tokens. A performance-based measure was used, in which listeners detected the presence of a repeated token either within one sequence or between the two sequences (measures of voluntary and obligatory streaming, respectively). The results showed a systematic increase of voluntary stream segregation as the F0 difference between the two interleaved sequences increased from 0 to 13 semitones, suggesting that F0 differences allowed listeners to segregate speech sounds, including the unvoiced portions. In contrast to the consistent effects of voluntary streaming, the trend towards obligatory stream segregation at large F0 differences failed to reach significance. Listeners were no longer able to perform the voluntary-streaming task reliably when the unvoiced portions were removed from the stimuli, suggesting that the unvoiced portions were used and correctly segregated in the original task. The results demonstrate that streaming based on F0 differences occurs for natural speech sounds, and that the unvoiced portions are correctly assigned to the corresponding voiced portions.

Keywords: Fundamental frequency; Speech sounds; Stream segregation.

Publication types

Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Acoustic Stimulation
Adolescent
Adult
Audiometry, Speech
Auditory Threshold
Cues*
Female
Humans
Male
Middle Aged
Speech Acoustics*
Speech Intelligibility*
Speech Perception*
Voice Quality*
Young Adult

Grants and funding

R01 DC007657/DC/NIDCD NIH HHS/United States