A Novel Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning

Tassadaq Hussain; Muhammad Diyan; Mandar Gogate; Kia Dashtipour; Ahsan Adeel; Yu Tsao; Amir Hussain

doi:10.1109/EMBC48229.2022.9871113

A Novel Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul:2022:2581-2584. doi: 10.1109/EMBC48229.2022.9871113.

Authors

Tassadaq Hussain, Muhammad Diyan, Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Yu Tsao, Amir Hussain

PMID: 36085897
DOI: 10.1109/EMBC48229.2022.9871113

Abstract

Current deep learning (DL) based approaches to speech intelligibility enhancement in noisy environments are often trained to minimise the feature distance between noise-free speech and enhanced speech signals. Despite improving the speech quality, such approaches do not deliver required levels of speech intelligibility in everyday noisy environments. Intelligibility-oriented (I-O) loss functions have recently been developed to train DL approaches for robust speech enhancement. Here, we formulate, for the first time, a novel canonical correlation based I-O loss function to more effectively train DL algorithms. Specifically, we present a canonical-correlation based short-time objective intelligibility (CC-STOI) cost function to train a fully convolutional neural network (FCN) model. We carry out comparative simulation experiments to show that our CC-STOI based speech enhancement framework outperforms state-of-the-art DL models trained with conventional distance-based and STOI-based loss functions, using objective and subjective evaluation measures for case of both unseen speakers and noises. Ongoing future work is evaluating the proposed approach for design of robust hearing-assistive technology.

MeSH terms

Algorithms
Canonical Correlation Analysis
Deep Learning*
Hearing
Speech Intelligibility*