A multimodal spectral approach to characterize rhythm in natural speech

Anna Maria Alexandrou; Timo Saarinen; Jan Kujala; Riitta Salmelin

doi:10.1121/1.4939496

A multimodal spectral approach to characterize rhythm in natural speech

J Acoust Soc Am. 2016 Jan;139(1):215-26. doi: 10.1121/1.4939496.

Authors

Anna Maria Alexandrou¹, Timo Saarinen¹, Jan Kujala¹, Riitta Salmelin¹

Affiliation

¹ Department of Neuroscience and Biomedical Engineering, Aalto University, FI-00076 AALTO, Finland.

PMID: 26827019
DOI: 10.1121/1.4939496

Abstract

Human utterances demonstrate temporal patterning, also referred to as rhythm. While simple oromotor behaviors (e.g., chewing) feature a salient periodical structure, conversational speech displays a time-varying quasi-rhythmic pattern. Quantification of periodicity in speech is challenging. Unimodal spectral approaches have highlighted rhythmic aspects of speech. However, speech is a complex multimodal phenomenon that arises from the interplay of articulatory, respiratory, and vocal systems. The present study addressed the question of whether a multimodal spectral approach, in the form of coherence analysis between electromyographic (EMG) and acoustic signals, would allow one to characterize rhythm in natural speech more efficiently than a unimodal analysis. The main experimental task consisted of speech production at three speaking rates; a simple oromotor task served as control. The EMG-acoustic coherence emerged as a sensitive means of tracking speech rhythm, whereas spectral analysis of either EMG or acoustic amplitude envelope alone was less informative. Coherence metrics seem to distinguish and highlight rhythmic structure in natural speech.

Publication types

Randomized Controlled Trial
Research Support, Non-U.S. Gov't

MeSH terms

Adult
Electromyography
Female
Humans
Male
Periodicity
Sound Spectrography
Speech / physiology*
Speech Acoustics
Speech Production Measurement
Young Adult