Probability-Based Best Sample Selection for Acoustic Analysis of Normal and Disordered Voices

J Voice. 2022 Jan;36(1):21-26. doi: 10.1016/j.jvoice.2020.03.011. Epub 2020 May 29.

Abstract

Purpose: Acoustic analysis is a commonly used method for quantitatively measuring vocal fold function. The accuracy of acoustic analysis depends upon the operator selecting a stable segment of the voice sample to analyze. This paper proposes a novel method to more accurately and reliably select a stable voice segment.

Study design: Four selection methods were implemented to evaluate each raw audio signal and determine the most stable segment of each signal: The proposed modal periodogram method, the moving window method, the midvowel method, and the whole vowel method. Acoustic parameters of interest-namely perturbation (jitter), correlation dimension (D2), and spectrum convergence ratio (SCR)-were calculated for 48 phonation samples to evaluate each method.

Methods: The proposed modal periodogram method utilizes a minimum mean-square error based approach to calculate a stable modal periodogram and obtain the most stable segment. The Wilcoxon Signed-Rank test was used to compare jitter, D2, and SCR values acquired using the modal periodogram method against the current standard segment selection methods.

Results: The modal periodogram method yielded significantly lower D2 values, and a significantly higher SCR for both normal and disordered voice samples (P < 0.01). This indicates that the modal periodogram method is more apt for selecting a stable audio segment than the other selection methods.

Keywords: Minimum mean square error; Modal periodogram; Modal presence probability; Voice segment selection.

MeSH terms

  • Acoustics
  • Humans
  • Phonation
  • Probability
  • Speech Acoustics*
  • Voice*