Development of sEMG sensors and algorithms for silent speech recognition

Geoffrey S Meltzner; James T Heaton; Yunbin Deng; Gianluca De Luca; Serge H Roy; Joshua C Kline

doi:10.1088/1741-2552/aac965

Development of sEMG sensors and algorithms for silent speech recognition

J Neural Eng. 2018 Aug;15(4):046031. doi: 10.1088/1741-2552/aac965. Epub 2018 Jun 1.

Authors

Geoffrey S Meltzner¹, James T Heaton, Yunbin Deng, Gianluca De Luca, Serge H Roy, Joshua C Kline

Affiliation

¹ VocaliD, Inc. 50 Leonard St, Belmont, MA 02478, United States of America.

Abstract

Objective: Speech is among the most natural forms of human communication, thereby offering an attractive modality for human-machine interaction through automatic speech recognition (ASR). However, the limitations of ASR-including degradation in the presence of ambient noise, limited privacy and poor accessibility for those with significant speech disorders-have motivated the need for alternative non-acoustic modalities of subvocal or silent speech recognition (SSR).

Approach: We have developed a new system of face- and neck-worn sensors and signal processing algorithms that are capable of recognizing silently mouthed words and phrases entirely from the surface electromyographic (sEMG) signals recorded from muscles of the face and neck that are involved in the production of speech. The algorithms were strategically developed by evolving speech recognition models: first for recognizing isolated words by extracting speech-related features from sEMG signals, then for recognizing sequences of words from patterns of sEMG signals using grammar models, and finally for recognizing a vocabulary of previously untrained words using phoneme-based models. The final recognition algorithms were integrated with specially designed multi-point, miniaturized sensors that can be arranged in flexible geometries to record high-fidelity sEMG signal measurements from small articulator muscles of the face and neck.

Main results: We tested the system of sensors and algorithms during a series of subvocal speech experiments involving more than 1200 phrases generated from a 2200-word vocabulary and achieved an 8.9%-word error rate (91.1% recognition rate), far surpassing previous attempts in the field.

Significance: These results demonstrate the viability of our system as an alternative modality of communication for a multitude of applications including: persons with speech impairments following a laryngectomy; military personnel requiring hands-free covert communication; or the consumer in need of privacy while speaking on a mobile phone in public.

MeSH terms

Adult
Algorithms*
Electromyography / methods*
Electromyography / trends*
Facial Muscles / physiology
Female
Humans
Male
Neck Muscles / physiology
Speech Perception / physiology*
Speech Recognition Software / trends*
Young Adult

Grants and funding

R44 DC014870/DC/NIDCD NIH HHS/United States