Cochlea-inspired speech recognition interface

Med Biol Eng Comput. 2019 Jun;57(6):1393-1403. doi: 10.1007/s11517-019-01963-6. Epub 2019 Mar 4.

Abstract

Automatic speech recognition (ASR) technology provides a natural interface for human-machine interaction. Typical ASR systems can achieve high performance in quiet environments but, unlike humans, perform poorly in real-world situations. To better simulate the human auditory periphery and improve the performance in realistic noisy scenarios, we propose two models of speech recognition front-ends based on a biophysical cochlear model. The first front-end is based on the method of signal reconstruction from a basilar membrane response. When applied to noisy speech, this method results in improved signal quality. This method can be used as a preprocessing step in a standard ASR system and can also be used as a noise reduction technique for other applications. The second front-end we propose is based on the construction of speech recognition coefficients directly from a basilar membrane response. Experimental results using a continuous-density hidden Markov model (HMM) recognizer demonstrate significant improvement in performance compared to standard Mel-frequency cepstral coefficients (MFCC) in various types of noisy conditions. Graphical Abstract Speech recognition model based on cochlear front-end.

Keywords: Biophysical cochlear model; Noise robustness; Speech recognition interface.

MeSH terms

  • Biophysical Phenomena
  • Cochlea / physiology*
  • Humans
  • Markov Chains
  • Models, Biological
  • Signal Processing, Computer-Assisted
  • Sound Spectrography
  • Speech / physiology*