Auditory attention decoding from EEG-based Mandarin speech envelope reconstruction

Hear Res. 2022 Sep 1:422:108552. doi: 10.1016/j.heares.2022.108552. Epub 2022 Jun 11.

Abstract

In the cocktail party circumstance, the human auditory system extracts the information from a specific speaker of interest and ignores others. Many studies have focused on auditory attention decoding (AAD), but the stimulation materials were mainly non-tonal languages. We used a tonal language (Mandarin) as the speech stimulus and constructed a Long Short-Term Memory (LSTM) architecture for speech envelope reconstruction based on electroencephalogram (EEG) data. The correlation coefficient between the reconstructed and candidate envelopes was calculated to determine the subject's auditory attention. The proposed LSTM architecture outperformed the linear models. The average decoding accuracy in cross-subject and inter-subject cases varies from 63.02 to 74.29%, with the highest accuracy rate of 89.1% in a decision window of 0.15 s. In addition, the beta-band rhythm was found to play an essential role in identifying the attention and the non-attention state. These results provide a new AAD architecture to help develop neuro-steered hearing devices, especially for tonal languages.

Keywords: Auditory attention; Cocktail party; Electroencephalograph; Long Short-Term Memory; Mandarin.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Acoustic Stimulation / methods
  • Attention / physiology
  • Electroencephalography
  • Humans
  • Linear Models
  • Speech
  • Speech Perception* / physiology