Indirect and direct training of spiking neural networks for end-to-end control of a lane-keeping vehicle

Zhenshan Bing; Claus Meschede; Guang Chen; Alois Knoll; Kai Huang

doi:10.1016/j.neunet.2019.05.019

Indirect and direct training of spiking neural networks for end-to-end control of a lane-keeping vehicle

Neural Netw. 2020 Jan:121:21-36. doi: 10.1016/j.neunet.2019.05.019. Epub 2019 Jul 9.

Authors

Zhenshan Bing¹, Claus Meschede², Guang Chen³, Alois Knoll⁴, Kai Huang⁵

Affiliations

¹ School of Data and Computer Science, Sun Yat-Sen University, China; Department of Computer Science, Technical University of Munich, Germany. Electronic address: bingzs@mail.sysu.edu.cn.
² Department of Computer Science, Technical University of Munich, Germany. Electronic address: claus.meschede@tum.de.
³ School of Automotive Studies, Tongji University, China. Electronic address: guangchen@tongji.edu.cn.
⁴ Department of Computer Science, Technical University of Munich, Germany. Electronic address: knoll@in.tum.de.
⁵ School of Data and Computer Science, Sun Yat-Sen University, China. Electronic address: huangk36@mail.sysu.edu.cn.

PMID: 31526952
DOI: 10.1016/j.neunet.2019.05.019

Abstract

Building spiking neural networks (SNNs) based on biological synaptic plasticities holds a promising potential for accomplishing fast and energy-efficient computing, which is beneficial to mobile robotic applications. However, the implementations of SNNs in robotic fields are limited due to the lack of practical training methods. In this paper, we therefore introduce both indirect and direct end-to-end training methods of SNNs for a lane-keeping vehicle. First, we adopt a policy learned using the Deep Q-Learning (DQN) algorithm and then subsequently transfer it to an SNN using supervised learning. Second, we adopt the reward-modulated spike-timing-dependent plasticity (R-STDP) for training SNNs directly, since it combines the advantages of both reinforcement learning and the well-known spike-timing-dependent plasticity (STDP). We examine the proposed approaches in three scenarios in which a robot is controlled to keep within lane markings by using an event-based neuromorphic vision sensor. We further demonstrate the advantages of the R-STDP approach in terms of the lateral localization accuracy and training time steps by comparing them with other three algorithms presented in this paper.

Keywords: End-to-end learning; Lane keeping; R-STDP; Spiking neural network.

MeSH terms

Algorithms
Neural Networks, Computer*
Neuronal Plasticity / physiology*
Neurons / physiology*
Reinforcement, Psychology
Robotics / instrumentation*
Robotics / methods