Comparison of Acoustic Voice Features Derived From Mobile Devices and Studio Microphone Recordings

Vitória S Fahed; Emer P Doheny; Monica Busse; Jennifer Hoblyn; Madeleine M Lowery

doi:10.1016/j.jvoice.2022.10.006

Comparison of Acoustic Voice Features Derived From Mobile Devices and Studio Microphone Recordings

J Voice. 2022 Nov 12:S0892-1997(22)00312-5. doi: 10.1016/j.jvoice.2022.10.006. Online ahead of print.

Authors

Vitória S Fahed¹, Emer P Doheny², Monica Busse³, Jennifer Hoblyn⁴, Madeleine M Lowery²

Affiliations

¹ School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland. Electronic address: vitoria.fahed@ucdconnect.ie.
² School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland.
³ Centre for Trials Research, Cardiff University, Cardiff, UK.
⁴ School of Medicine, Trinity College Dublin, Dublin, Ireland; Bloomfield Health Services, Dublin, Ireland.

PMID: 36379826
DOI: 10.1016/j.jvoice.2022.10.006

Abstract

Objectives/hypothesis: Improvements in mobile device technology offer new opportunities for remote monitoring of voice for home and clinical assessment. However, there is a need to establish equivalence between features derived from signals recorded from mobile devices and gold standard microphone-preamplifiers. In this study acoustic voice features from android smartphone, tablet, and microphone-preamplifier recordings were compared.

Methods: Data were recorded from 37 volunteers (20 female) with no history of speech disorder and six volunteers with Huntington's disease (HD) during sustained vowel (SV) phonation, reading passage (RP), and five syllable repetition (SR) tasks. The following features were estimated: fundamental frequency median and standard deviation (F0 and SD F0), harmonics-to-noise ratio (HNR), local jitter, relative average perturbation of jitter (RAP), five-point period perturbation quotient (PPQ5), difference of differences of amplitude and periods (DDA and DDP), shimmer, and amplitude perturbation quotients (APQ3, APQ5, and APQ11).

Results: Bland-Altman analysis revealed good agreement between microphone and mobile devices for fundamental frequency, jitter, RAP, PPQ5, and DDP during all tasks and a bias for HNR, shimmer and its variants (APQ3, APQ5, APQ11, and DDA). Significant differences were observed between devices for HNR, shimmer, and its variants for all tasks. High correlation was observed between devices for all features, except SD F0 for RP. Similar results were observed in the HD group for SV and SR task. Biological sex had a significant effect on F0 and HNR during all tests, and for jitter, RAP, PPQ5, DDP, and shimmer for RP and SR. No significant effect of age was observed.

Conclusions: Mobile devices provided good agreement with state of the art, high-quality microphones during structured speech tasks for features derived from frequency components of the audio recordings. Caution should be taken when estimating HNR, shimmer and its variants from recordings made with mobile devices.

Keywords: Acoustic voice features; Huntington's disease; Microphone; Mobile devices.