Silent Speech Recognition as an Alternative Communication Device for Persons with Laryngectomy

Geoffrey S Meltzner; James T Heaton; Yunbin Deng; Gianluca De Luca; Serge H Roy; Joshua C Kline

doi:10.1109/TASLP.2017.2740000

Silent Speech Recognition as an Alternative Communication Device for Persons with Laryngectomy

IEEE/ACM Trans Audio Speech Lang Process. 2017 Dec;25(12):2386-2398. doi: 10.1109/TASLP.2017.2740000. Epub 2017 Nov 28.

Authors

Geoffrey S Meltzner¹, James T Heaton², Yunbin Deng³, Gianluca De Luca⁴, Serge H Roy⁴, Joshua C Kline⁴

Affiliations

¹ VocaliD, Inc. Belmont, MA, 02478, USA.
² Harvard Medical School in the Department of Surgery, Massachusetts General Hospital Voice Center, Boston, MA 02114.
³ BAE Systems, Burlington, MA 01803 USA.
⁴ Delsys, Inc., and Altec, Inc., Natick MA 01760 USA.

Abstract

Each year thousands of individuals require surgical removal of their larynx (voice box) due to trauma or disease, and thereby require an alternative voice source or assistive device to verbally communicate. Although natural voice is lost after laryngectomy, most muscles controlling speech articulation remain intact. Surface electromyographic (sEMG) activity of speech musculature can be recorded from the neck and face, and used for automatic speech recognition to provide speech-to-text or synthesized speech as an alternative means of communication. This is true even when speech is mouthed or spoken in a silent (subvocal) manner, making it an appropriate communication platform after laryngectomy. In this study, 8 individuals at least 6 months after total laryngectomy were recorded using 8 sEMG sensors on their face (4) and neck (4) while reading phrases constructed from a 2,500-word vocabulary. A unique set of phrases were used for training phoneme-based recognition models for each of the 39 commonly used phonemes in English, and the remaining phrases were used for testing word recognition of the models based on phoneme identification from running speech. Word error rates were on average 10.3% for the full 8-sensor set (averaging 9.5% for the top 4 participants), and 13.6% when reducing the sensor set to 4 locations per individual (n=7). This study provides a compelling proof-of-concept for sEMG-based alaryngeal speech recognition, with the strong potential to further improve recognition performance.

Keywords: Alaryngeal Speech; Assistive technology; Augmentative and Alternative Communication; Automatic Speech Recognition; EMG; Subvocal Speech Recognition; electromyography.

Grants and funding

R44 DC014870/DC/NIDCD NIH HHS/United States