Event-Based Gesture Recognition through a Hierarchy of Time-Surfaces for FPGA

Ricardo Tapiador-Morales; Jean-Matthieu Maro; Angel Jimenez-Fernandez; Gabriel Jimenez-Moreno; Ryad Benosman; Alejandro Linares-Barranco

doi:10.3390/s20123404

Event-Based Gesture Recognition through a Hierarchy of Time-Surfaces for FPGA

Sensors (Basel). 2020 Jun 16;20(12):3404. doi: 10.3390/s20123404.

Authors

Ricardo Tapiador-Morales^{1

2}, Jean-Matthieu Maro³, Angel Jimenez-Fernandez^{1

4}, Gabriel Jimenez-Moreno^{1

4}, Ryad Benosman³, Alejandro Linares-Barranco^{1

4}

Affiliations

¹ Robotics and Technology of Computers Lab (ETSII-EPS), University of Seville, 41089 Sevilla, Spain.
² aiCTX AG, 8092 Zurich, Switzerland.
³ Neuromorphic Vision and Natural Computation, Sorbonne Université, 75006 Paris, France.
⁴ SCORE Lab, Research Institute of Computer Engineering (I3US), University of Seville, 41089 Seville, Spain.

Abstract

Neuromorphic vision sensors detect changes in luminosity taking inspiration from mammalian retina and providing a stream of events with high temporal resolution, also known as Dynamic Vision Sensors (DVS). This continuous stream of events can be used to extract spatio-temporal patterns from a scene. A time-surface represents a spatio-temporal context for a given spatial radius around an incoming event from a sensor at a specific time history. Time-surfaces can be organized in a hierarchical way to extract features from input events using the Hierarchy Of Time-Surfaces algorithm, hereinafter HOTS. HOTS can be organized in consecutive layers to extract combination of features in a similar way as some deep-learning algorithms do. This work introduces a novel FPGA architecture for accelerating HOTS network. This architecture is mainly based on block-RAM memory and the non-restoring square root algorithm, requiring basic components and enabling it for low-power low-latency embedded applications. The presented architecture has been tested on a Zynq 7100 platform at 100 MHz. The results show that the latencies are in the range of 1 μ s to 6.7 μ s, requiring a maximum dynamic power consumption of 77 mW. This system was tested with a gesture recognition dataset, obtaining an accuracy loss for 16-bit precision of only 1.2% with respect to the original software HOTS.

Keywords: AER; FPGA; HDL; dynamic vision sensors; event-based; pattern recognition; synchronous digital VLSI.

Grants and funding

TEC2016-77785-P/Ministerio de Economía y Competitividad