Efficient SNN multi-cores MAC array acceleration on SpiNNaker 2

Jiaxin Huang; Florian Kelber; Bernhard Vogginger; Chen Liu; Felix Kreutz; Pascal Gerhards; Daniel Scholz; Klaus Knobloch; Christian G Mayr

doi:10.3389/fnins.2023.1223262

Efficient SNN multi-cores MAC array acceleration on SpiNNaker 2

Front Neurosci. 2023 Aug 7:17:1223262. doi: 10.3389/fnins.2023.1223262. eCollection 2023.

Authors

Jiaxin Huang¹, Florian Kelber², Bernhard Vogginger², Chen Liu², Felix Kreutz¹, Pascal Gerhards¹, Daniel Scholz¹, Klaus Knobloch¹, Christian G Mayr^{2

3}

Affiliations

¹ Infineon Technologies Dresden, Dresden, Germany.
² Highly-Parallel VLSI-Systems and Neuro-Microelectronics, Faculty of Electrical and Computer Engineering, Institute of Principles of Electrical and Electronic Engineering, Technische Universität Dresden, Dresden, Germany.
³ Centre for Tactile Internet with Human-in-the-Loop (CeTI), Cluster of Excellence, Technische Universität Dresden, Dresden, Germany.

Abstract

The potential low-energy feature of the spiking neural network (SNN) engages the attention of the AI community. Only CPU-involved SNN processing inevitably results in an inherently long temporal span in the cases of large models and massive datasets. This study introduces the MAC array, a parallel architecture on each processing element (PE) of SpiNNaker 2, into the computational process of SNN inference. Based on the work of single-core optimization algorithms, we investigate the parallel acceleration algorithms for collaborating with multi-core MAC arrays. The proposed Echelon Reorder model information densification algorithm, along with the adapted multi-core two-stage splitting and authorization deployment strategies, achieves efficient spatio-temporal load balancing and optimization performance. We evaluate the performance by benchmarking a wide range of constructed SNN models to research on the influence degree of different factors. We also benchmark with two actual SNN models (the gesture recognition model of the real-world application and balanced random cortex-like network from neuroscience) on the neuromorphic multi-core hardware SpiNNaker 2. The echelon optimization algorithm with mixed processors realizes 74.28% and 85.78% memory footprint of the original MAC calculation on these two models, respectively. The execution time of echelon algorithms using only MAC or mixed processors accounts for ≤ 24.56% of the serial ARM baseline. Accelerating SNN inference with algorithms in this study is essentially the general sparse matrix-matrix multiplication (SpGEMM) problem. This article explicitly expands the application field of the SpGEMM issue to SNN, developing novel SpGEMM optimization algorithms fitting the SNN feature and MAC array.

Keywords: MAC array; SNN; SpGEMM; SpiNNaker 2; multi-core load balancing deployment.

Grants and funding

This study was partially funded by the German Federal Ministry of Education and Research (BMBF) within the KI-ASIC project (16ES0993 and 16ES0996) and the German Research Foundation (DFG, Deutsche Forschungsgemeinschaft) as part of Germany's Excellence Strategy—EXC 2050/1—Project ID 390696704—Cluster of Excellence Centre for Tactile Internet with Human-in-the-Loop (CeTI) of Technische Universität Dresden.