Overt speech decoding from cortical activity: a comparison of different linear methods

Gaël Le Godais; Philémon Roussel; Florent Bocquelet; Marc Aubert; Philippe Kahane; Stéphan Chabardès; Blaise Yvert

doi:10.3389/fnhum.2023.1124065

Overt speech decoding from cortical activity: a comparison of different linear methods

Front Hum Neurosci. 2023 Jun 23:17:1124065. doi: 10.3389/fnhum.2023.1124065. eCollection 2023.

Authors

Gaël Le Godais¹, Philémon Roussel¹, Florent Bocquelet¹, Marc Aubert¹, Philippe Kahane^{1

2}, Stéphan Chabardès^{1

3}, Blaise Yvert¹

Affiliations

¹ Univ. Grenoble Alpes, INSERM, U1216, Grenoble Institut Neurosciences, Grenoble, France.
² CHU Grenoble Alpes, Department of Neurology, Grenoble, France.
³ Univ. Grenoble Alpes, CHU Grenoble Alpes, Clinatec, Grenoble, France.

Abstract

Introduction: Speech BCIs aim at reconstructing speech in real time from ongoing cortical activity. Ideal BCIs would need to reconstruct speech audio signal frame by frame on a millisecond-timescale. Such approaches require fast computation. In this respect, linear decoder are good candidates and have been widely used in motor BCIs. Yet, they have been very seldomly studied for speech reconstruction, and never for reconstruction of articulatory movements from intracranial activity. Here, we compared vanilla linear regression, ridge-regularized linear regressions, and partial least squares regressions for offline decoding of overt speech from cortical activity.

Methods: Two decoding paradigms were investigated: (1) direct decoding of acoustic vocoder features of speech, and (2) indirect decoding of vocoder features through an intermediate articulatory representation chained with a real-time-compatible DNN-based articulatory-to-acoustic synthesizer. Participant's articulatory trajectories were estimated from an electromagnetic-articulography dataset using dynamic time warping. The accuracy of the decoders was evaluated by computing correlations between original and reconstructed features.

Results: We found that similar performance was achieved by all linear methods well above chance levels, albeit without reaching intelligibility. Direct and indirect methods achieved comparable performance, with an advantage for direct decoding.

Discussion: Future work will address the development of an improved neural speech decoder compatible with fast frame-by-frame speech reconstruction from ongoing activity at a millisecond timescale.

Keywords: ECoG; articulatory synthesis; brain-computer interface; decoding; intracranial recordings; linear methods; speech prostheses.

Grants and funding

This work was supported by the FRM Foundation under Grant No. DBS20140930785, the French National Research Agency under Grant Agreement No. ANR-16-CE19-0005-01 (Brainspeak), and by the European Union's Horizon 2020 Research and Innovation Program under Grant Agreement No. 732032 (BrainCom).