Perception of emotional valences and activity levels from vowel segments of continuous speech

Teija Waaramaa; Anne-Maria Laukkanen; Matti Airas; Paavo Alku

doi:10.1016/j.jvoice.2008.04.004

Perception of emotional valences and activity levels from vowel segments of continuous speech

J Voice. 2010 Jan;24(1):30-8. doi: 10.1016/j.jvoice.2008.04.004. Epub 2008 Dec 25.

Authors

Teija Waaramaa¹, Anne-Maria Laukkanen, Matti Airas, Paavo Alku

Affiliation

¹ Department of Speech Communication and Voice Research, University of Tampere, Tampere, Finland. teija.waaramaa@uta.fi

PMID: 19111438
DOI: 10.1016/j.jvoice.2008.04.004

Abstract

This study aimed to investigate the role of voice source and formant frequencies in the perception of emotional valence and psychophysiological activity level from short vowel samples (approximately 150 milliseconds). Nine professional actors (five males and four females) read a prose passage simulating joy, tenderness, sadness, anger, and a neutral emotional state. The stress carrying vowel [a:] was extracted from continuous speech during the Finnish word [ta:k:ahan] and analyzed for duration, fundamental frequency (F0), equivalent sound level (L(eq)), alpha ratio, and formant frequencies F1-F4. Alpha ratio was calculated by subtracting the L(eq) (dB) in the range 50 Hz-1 kHz from the L(eq) in the range 1-5 kHz. The samples were inverse filtered by Iterative Adaptive Inverse Filtering and the estimates of the glottal flow obtained were parameterized with the normalized amplitude quotient (NAQ = f(AC)/(d(peak)T)). Fifty listeners (mean age 28.5 years) identified the emotional valences from the randomized samples. Multinomial Logistic Regression Analysis was used to study the interrelations of the parameters for perception. It appeared to be possible to identify valences from vowel samples of short duration ( approximately 150 milliseconds). NAQ tended to differentiate between the valences and activity levels perceived in both genders. Voice source may not only reflect variations of F0 and L(eq), but may also have an independent role in expression, reflecting phonation types. To some extent, formant frequencies appeared to be related to valence perception but no clear patterns could be identified. Coding of valence tends to be a complicated multiparameter phenomenon with wide individual variation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adult
Emotions*
Female
Glottis / physiology
Humans
Language
Logistic Models
Male
Middle Aged
Phonetics*
Psychoacoustics
Psycholinguistics
Sex Characteristics
Speech Acoustics
Speech Perception*
Speech Production Measurement
Speech* / physiology
Time Factors