High and Wide: An In Silico Investigation of Frequency, Intensity, and Vibrato Effects on Widely Applied Acoustic Voice Perturbation and Noise Measures

Calvin Peter Baker; Meike Brockmann-Bauser; Suzanne C Purdy; Te Oti Rakena

doi:10.1016/j.jvoice.2023.10.007

High and Wide: An In Silico Investigation of Frequency, Intensity, and Vibrato Effects on Widely Applied Acoustic Voice Perturbation and Noise Measures

J Voice. 2023 Nov 2:S0892-1997(23)00316-8. doi: 10.1016/j.jvoice.2023.10.007. Online ahead of print.

Authors

Calvin Peter Baker¹, Meike Brockmann-Bauser², Suzanne C Purdy³, Te Oti Rakena⁴

Affiliations

¹ Speech Science, School of Psychology, University of Auckland, Auckland, New Zealand; School of Music, University of Auckland, Auckland, New Zealand. Electronic address: calvin.baker@auckland.ac.nz.
² Department of Phoniatrics and Speech Pathology, Clinic for Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, University of Zurich, Zurich, Switzerland.
³ Speech Science, School of Psychology, University of Auckland, Auckland, New Zealand.
⁴ School of Music, University of Auckland, Auckland, New Zealand.

PMID: 37925330
DOI: 10.1016/j.jvoice.2023.10.007

Abstract

Objectives: This in silico study explored the effects of a wide range of fundamental frequency (f_o), source-spectrum tilt (SST), and vibrato extent (VE) on commonly used frequency and amplitude perturbation and noise measures.

Method: Using 53 synthesized tones produced in Madde, the effects of stepwise increases in f_o, intensity (modeled by decreasing SST), and VE on the PRAAT parameters jitter % (local), relative average perturbation (RAP) %, shimmer % (local), amplitude perturbation quotient 3 (APQ3) %, and harmonics-to-noise ratio (HNR) dB were investigated. A secondary experiment was conducted to determine whether any f_o effects on jitter, RAP, shimmer, APQ3, and HNR were stable. A total of 10 sinewaves were synthesized in Sopran from 100 to 1000 Hz using formant frequencies for /a/, /i/, and /u/-like vowels, respectively. All effects were statistically assessed with Kendall's tau-b and partial correlation.

Results: Increasing f_o resulted in an overall increase in jitter, RAP, shimmer, and APQ3 values, respectively (P < 0.01). Oscillations of the data across the explored f_o range were observed in all measurement outputs. In the Sopran tests, the oscillatory pattern seen in the Madde f_o condition remained and showed differences between vowel conditions. Increasing intensity (decreasing SST) led to reduced pitch and amplitude perturbation and HNR (P < 0.05). Increasing VE led to lower HNR and an almost linear increase of all other measures (P < 0.05).

Conclusion: These novel data offer a controlled demonstration for the behavior of jitter (local) %, RAP %, shimmer (local) %, APQ3 %, and HNR (dB) when varying f_o, SST, and VE in synthesized tones. Since humans will vary in all of these aspects in spoken language and vowel phonation, researchers should take potential resonance-harmonics type effects into account when comparing intersubject or preintervention and postintervention data using these measures.

Keywords: Resonance-harmonics interactions; Singing voice analysis; Source-filter model; Synthetic; Voice diagnostics.