Cochlea-inspired speech recognition interface

Mladen Russo; Maja Stella; Marjan Sikora; Matko Šarić

doi:10.1007/s11517-019-01963-6

Cochlea-inspired speech recognition interface

Med Biol Eng Comput. 2019 Jun;57(6):1393-1403. doi: 10.1007/s11517-019-01963-6. Epub 2019 Mar 4.

Authors

Mladen Russo¹, Maja Stella², Marjan Sikora², Matko Šarić²

Affiliations

¹ Laboratory for Smart Environment Technologies, FESB - University of Split, Split, Croatia. mrusso@fesb.hr.
² Laboratory for Smart Environment Technologies, FESB - University of Split, Split, Croatia.

PMID: 30830542
DOI: 10.1007/s11517-019-01963-6

Abstract

Automatic speech recognition (ASR) technology provides a natural interface for human-machine interaction. Typical ASR systems can achieve high performance in quiet environments but, unlike humans, perform poorly in real-world situations. To better simulate the human auditory periphery and improve the performance in realistic noisy scenarios, we propose two models of speech recognition front-ends based on a biophysical cochlear model. The first front-end is based on the method of signal reconstruction from a basilar membrane response. When applied to noisy speech, this method results in improved signal quality. This method can be used as a preprocessing step in a standard ASR system and can also be used as a noise reduction technique for other applications. The second front-end we propose is based on the construction of speech recognition coefficients directly from a basilar membrane response. Experimental results using a continuous-density hidden Markov model (HMM) recognizer demonstrate significant improvement in performance compared to standard Mel-frequency cepstral coefficients (MFCC) in various types of noisy conditions. Graphical Abstract Speech recognition model based on cochlear front-end.

Keywords: Biophysical cochlear model; Noise robustness; Speech recognition interface.

MeSH terms

Biophysical Phenomena
Cochlea / physiology*
Humans
Markov Chains
Models, Biological
Signal Processing, Computer-Assisted
Sound Spectrography
Speech / physiology*

Grants and funding

UIP-2014-09-3875/Hrvatska Zaklada za Znanost