Silent speech command word recognition using stepped frequency continuous wave radar

Christoph Wagner; Petr Schaffer; Pouriya Amini Digehsara; Michael Bärhold; Dirk Plettemeier; Peter Birkholz

doi:10.1038/s41598-022-07842-9

Silent speech command word recognition using stepped frequency continuous wave radar

Sci Rep. 2022 Mar 9;12(1):4192. doi: 10.1038/s41598-022-07842-9.

Authors

Christoph Wagner^#¹, Petr Schaffer^#², Pouriya Amini Digehsara³, Michael Bärhold⁴, Dirk Plettemeier⁴, Peter Birkholz³

Affiliations

¹ Institute of Acoustics and Speech Communication, Chair for Speech Technology and Cognitive Systems, Technische Universität Dresden, 01069, Dresden, Germany. christoph.wagner@tu-dresden.de.
² Institute of Communication Technology, Chair of Radio Frequency and Photonics Engineering, Technische Universität Dresden, 01069, Dresden, Germany. petr.schaffer@tu-dresden.de.
³ Institute of Acoustics and Speech Communication, Chair for Speech Technology and Cognitive Systems, Technische Universität Dresden, 01069, Dresden, Germany.
⁴ Institute of Communication Technology, Chair of Radio Frequency and Photonics Engineering, Technische Universität Dresden, 01069, Dresden, Germany.

^# Contributed equally.

Abstract

Recovering speech in the absence of the acoustic speech signal itself, i.e., silent speech, holds great potential for restoring or enhancing oral communication in those who lost it. Radar is a relatively unexplored silent speech sensing modality, even though it has the advantage of being fully non-invasive. We therefore built a custom stepped frequency continuous wave radar hardware to measure the changes in the transmission spectra during speech between three antennas, located on both cheeks and the chin with a measurement update rate of 100 Hz. We then recorded a command word corpus of 40 phonetically balanced, two-syllable German words and the German digits zero to nine for two individual speakers and evaluated both the speaker-dependent multi-session and inter-session recognition accuracies on this 50-word corpus using a bidirectional long-short term memory network. We obtained recognition accuracies of 99.17% and 88.87% for the speaker-dependent multi-session and inter-session accuracy, respectively. These results show that the transmission spectra are very well suited to discriminate individual words from one another, even across different sessions, which is one of the key challenges for fully non-invasive silent speech interfaces.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Language
Radar
Recognition, Psychology
Speech Perception*
Speech*