Congruent audiovisual speech enhances auditory attention decoding with EEG

Zhen Fu; Xihong Wu; Jing Chen

doi:10.1088/1741-2552/ab4340

Congruent audiovisual speech enhances auditory attention decoding with EEG

J Neural Eng. 2019 Nov 6;16(6):066033. doi: 10.1088/1741-2552/ab4340.

Authors

Zhen Fu¹, Xihong Wu, Jing Chen

Affiliation

¹ Department of Machine Intelligence, Speech and Hearing Research Center, and Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing 100871, People's Republic of China.

PMID: 31505476
DOI: 10.1088/1741-2552/ab4340

Abstract

Objective: The auditory attention decoding (AAD) approach can be used to determine the identity of the attended speaker during an auditory selective attention task, by analyzing measurements of electroencephalography (EEG) data. The AAD approach has the potential to guide the design of speech enhancement algorithms in hearing aids, i.e. to identify the speech stream of listener's interest so that hearing aids algorithms can amplify the target speech and attenuate other distracting sounds. This would consequently result in improved speech understanding and communication and reduced cognitive load, etc. The present work aimed to investigate whether additional visual input (i.e. lipreading) would enhance the AAD performance for normal-hearing listeners.

Approach: In a two-talker scenario, where auditory stimuli of audiobooks narrated by two speakers were presented, multi-channel EEG signals were recorded while participants were selectively attending to one speaker and ignoring the other one. Speakers' mouth movements were recorded during narrating for providing visual stimuli. Stimulus conditions included audio-only, visual input congruent with either (i.e. attended or unattended) speaker, and visual input incongruent with either speaker. The AAD approach was performed separately for each condition to evaluate the effect of additional visual input on AAD.

Main results: Relative to the audio-only condition, the AAD performance was found improved by visual input only when it was congruent with the attended speech stream, and the improvement was about 14 percentage points on decoding accuracy. Cortical envelope tracking activities in both auditory and visual cortex were demonstrated stronger for the congruent audiovisual speech condition than other conditions. In addition, a higher AAD robustness was revealed for the congruent audiovisual condition, with reduced channel number and trial duration achieving higher accuracy than the audio-only condition.

Significance: The present work complements previous studies and further manifests the feasibility of the AAD-guided design of hearing aids in daily face-to-face conversations. The present work also has a directive significance for designing a low-density EEG setup for the AAD approach.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Acoustic Stimulation / methods*
Adult
Attention / physiology*
Auditory Perception / physiology
Electroencephalography / methods*
Female
Humans
Lipreading
Male
Photic Stimulation / methods*
Speech Perception / physiology*
Visual Perception / physiology*
Young Adult