Estimation of a priori signal-to-noise ratio using neurograms for speech enhancement

Wissam A Jassim; Naomi Harte

doi:10.1121/10.0001324

Estimation of a priori signal-to-noise ratio using neurograms for speech enhancement

J Acoust Soc Am. 2020 Jun;147(6):3830. doi: 10.1121/10.0001324.

Authors

Wissam A Jassim¹, Naomi Harte¹

Affiliation

¹ Sigmedia Group, ADAPT Centre, School of Engineering, Trinity College Dublin, Ireland.

PMID: 32611151
DOI: 10.1121/10.0001324

Abstract

In statistical-based speech enhancement algorithms, the a priori signal-to-noise ratio (SNR) must be estimated to calculate the required spectral gain function. This paper proposes a method to improve this estimation using features derived from the neural responses of the auditory-nerve (AN) system. The neural responses, interpreted as a neurogram (NG), are simulated for noisy speech using a computational model of the AN system with a range of characteristic frequencies (CFs). Two machine learning algorithms were explored to train the estimation model based on NG features: support vector regression and a convolutional neural network. The proposed estimator was placed in a common speech enhancement system, and three conventional spectral gain functions were employed to estimate the enhanced signal. The proposed method was tested using the NOIZEUS database at different SNR levels, and various speech quality and intelligibility measures were employed for performance evaluation. The a priori SNR estimated from NG features achieved better quality and intelligibility scores than that of recent estimators, especially for highly distorted speech and low SNR values.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Noise / adverse effects
Signal-To-Noise Ratio
Speech Intelligibility
Speech Perception*
Speech*