A joint framework for blind prediction of binaural speech intelligibility and perceived listening effort

Jan Rennies; Saskia Röttges; Rainer Huber; Christopher F Hauth; Thomas Brand

doi:10.1016/j.heares.2022.108598

A joint framework for blind prediction of binaural speech intelligibility and perceived listening effort

Hear Res. 2022 Dec:426:108598. doi: 10.1016/j.heares.2022.108598. Epub 2022 Aug 8.

Authors

Jan Rennies¹, Saskia Röttges², Rainer Huber³, Christopher F Hauth², Thomas Brand²

Affiliations

¹ Fraunhofer IDMT, Hearing, Speech and Audio Technology and Cluster of Excellence Hearing4all, Marie-Curie-Str. 2, 26129 Oldenburg, Germany. Electronic address: jan.rennies@idmt.fraunhofer.de.
² Department für Medizinische Physik und Akustik, Carl von Ossietzky Universität Oldenburg and Cluster of Excellence Hearing4all, Oldenburg, Germany.
³ Fraunhofer IDMT, Hearing, Speech and Audio Technology and Cluster of Excellence Hearing4all, Marie-Curie-Str. 2, 26129 Oldenburg, Germany.

PMID: 35995688
DOI: 10.1016/j.heares.2022.108598

Abstract

Speech perception is strongly affected by noise and reverberation in the listening room, and binaural processing can substantially facilitate speech perception in conditions when target speech and maskers originate from different directions. Most studies and proposed models for predicting spatial unmasking have focused on speech intelligibility. The present study introduces a model framework that predicts both speech intelligibility and perceived listening effort from the same output measure. The framework is based on a combination of a blind binaural processing stage employing a blind equalization cancelation (EC) mechanism, and a blind backend based on phoneme probability classification. Neither frontend nor backend require any additional information, such as the source directions, the signal-to-noise ratio (SNR), or the number of sources, allowing for a fully blind perceptual assessment of binaural input signals consisting of target speech mixed with noise. The model is validated against a recent data set in which speech intelligibility and perceived listening effort were measured for a range of acoustic conditions differing in reverberation and binaural cues [Rennies and Kidd (2018), J. Acoust. Soc. Am. 144, 2147-2159]. Predictions of the proposed model are compared with a non-blind binaural model consisting of a non-blind EC stage and a backend based on the speech intelligibility index. The analyses indicated that all main trends observed in the experiments were correctly predicted by the blind model. The overall proportion of variance explained by the model (R² = 0.94) for speech intelligibility was slightly worse than for the non-blind model (R² = 0.98). For listening effort predictions, both models showed lower prediction accuracy, but still explained significant proportions of the observed variance (R² = 0.88 and R² = 0.71 for the non-blind and blind model, respectively). Closer inspection showed that the differences between data and predictions were largest for binaural conditions at high SNRs, where the perceived listening effort of human listeners tended to be underestimated by the models, specifically by the blind version.

Keywords: Binaural speech intelligibility model; Listening effort; Reverberation; Spatial hearing; Speech intelligibility.

Publication types

Review
Research Support, Non-U.S. Gov't

MeSH terms

Humans
Listening Effort
Noise / adverse effects
Perceptual Masking
Signal-To-Noise Ratio
Speech Intelligibility*
Speech Perception*