Validation of Cepstral Acoustic Analysis for Normal and Pathological Voice in the Japanese Language

J Voice. 2022 Nov;36(6):770-776. doi: 10.1016/j.jvoice.2020.08.026. Epub 2020 Sep 18.

Abstract

Objectives: Cepstral analysis does not require the detection of pitch within waveforms, which makes it suitable for acoustic evaluation of connected speech contexts and severely disordered voice. Although the utility of cepstral measurements, including cepstral peak prominence (CPP) and cepstral spectral index of dysphonia (CSID), has been reported for several languages, it has yet to be demonstrated in the Japanese language. The current study aimed to investigate the utility of cepstral acoustic analysis for the Japanese language as an indicator of dysphonia and the degree of dysphonia severity.

Methods: Ninety-five patients with dysphonia and thirty volunteers without voice complaint uttered the sustained vowel /a/ and read four Japanese sentences designed to elicit different laryngeal behaviors. The recorded voice samples were evaluated perceptually by three raters according to the GRBAS scale (grade) and overall severity (OS) on a visual analog scale. Participants were then divided into four groups based on grade and OS: non-, mildly, moderately, and severely dysphonic groups. For the acoustic analysis, CPP and CSID were computed using the Analysis of Dysphonia in Speech and Voice, while jitter percentage (Jitt), shimmer percentage (Shim), and noise to harmonic ratio were computed using the Multi-Dimensional Voice Program.

Results: Statistical analysis revealed that both CPP and CSID differed significantly between all groups, except for grade between the non-dysphonic and mildly dysphonic groups. Pearson correlation analysis between the acoustic measurements and the perceptual ratings revealed that the absolute correlation coefficients for CPP, CSID, and Jitt were greater than 0.7. Specifically, those for CPP and CSID were greater than 0.8 for OS. Receiver operating characteristic curve analysis showed that the AUC for CPP, CSID, Jitt, and Shim was greater than 0.8 for both grade and OS. The cut-off values for CPP and CSID, as determined by the Youden Index, were 6.74-7.18 and 12.16-20.39, respectively.

Conclusion: The current study demonstrated the validity of CPP and CSID as indicators of dysphonia and indices of dysphonia severity in the Japanese language.

Keywords: Cepstral acoustic analysis; Cepstral peak prominence (CPP); Cepstral spectral index of dysphonia (CSID); Dysphonia.

MeSH terms

  • Acoustics
  • Dysphonia* / diagnosis
  • Hoarseness
  • Humans
  • Japan
  • Language
  • Severity of Illness Index
  • Speech Acoustics
  • Speech Perception*
  • Speech Production Measurement / methods
  • Voice Quality