Combining brain-computer interfaces with deep reinforcement learning for robot training: a feasibility study in a simulation environment

Front Neuroergon. 2023 Nov 23:4:1274730. doi: 10.3389/fnrgo.2023.1274730. eCollection 2023.

Abstract

Deep reinforcement learning (RL) is used as a strategy to teach robot agents how to autonomously learn complex tasks. While sparsity is a natural way to define a reward in realistic robot scenarios, it provides poor learning signals for the agent, thus making the design of good reward functions challenging. To overcome this challenge learning from human feedback through an implicit brain-computer interface (BCI) is used. We combined a BCI with deep RL for robot training in a 3-D physical realistic simulation environment. In a first study, we compared the feasibility of different electroencephalography (EEG) systems (wet- vs. dry-based electrodes) and its application for automatic classification of perceived errors during a robot task with different machine learning models. In a second study, we compared the performance of the BCI-based deep RL training to feedback explicitly given by participants. Our findings from the first study indicate the use of a high-quality dry-based EEG-system can provide a robust and fast method for automatically assessing robot behavior using a sophisticated convolutional neural network machine learning model. The results of our second study prove that the implicit BCI-based deep RL version in combination with the dry EEG-system can significantly accelerate the learning process in a realistic 3-D robot simulation environment. Performance of the BCI-based trained deep RL model was even comparable to that achieved by the approach with explicit human feedback. Our findings emphasize the usage of BCI-based deep RL methods as a valid alternative in those human-robot applications where no access to cognitive demanding explicit human feedback is available.

Keywords: brain-computer interface; deep reinforcement learning; electroencephalography; error monitoring; event-related potentials (ERP); machine learning; robotics.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by grants from the Fraunhofer Internal Programs under Grant No. Discover 600 030 and the Ministry of Economic Affairs, Labor and Tourism Baden-Wuerttemberg; Project: KI-Fortschrittszentrum Lernende Systeme und Kognitive Robotik.