Effective Plug-Ins for Reducing Inference-Latency of Spiking Convolutional Neural Networks During Inference Phase

Front Comput Neurosci. 2021 Oct 18:15:697469. doi: 10.3389/fncom.2021.697469. eCollection 2021.

Abstract

Convolutional Neural Networks (CNNs) are effective and mature in the field of classification, while Spiking Neural Networks (SNNs) are energy-saving for their sparsity of data flow and event-driven working mechanism. Previous work demonstrated that CNNs can be converted into equivalent Spiking Convolutional Neural Networks (SCNNs) without obvious accuracy loss, including different functional layers such as Convolutional (Conv), Fully Connected (FC), Avg-pooling, Max-pooling, and Batch-Normalization (BN) layers. To reduce inference-latency, existing researches mainly concentrated on the normalization of weights to increase the firing rate of neurons. There are also some approaches during training phase or altering the network architecture. However, little attention has been paid on the end of inference phase. From this new perspective, this paper presents 4 stopping criterions as low-cost plug-ins to reduce the inference-latency of SCNNs. The proposed methods are validated using MATLAB and PyTorch platforms with Spiking-AlexNet for CIFAR-10 dataset and Spiking-LeNet-5 for MNIST dataset. Simulation results reveal that, compared to the state-of-the-art methods, the proposed method can shorten the average inference-latency of Spiking-AlexNet from 892 to 267 time steps (almost 3.34 times faster) with the accuracy decline from 87.95 to 87.72%. With our methods, 4 types of Spiking-LeNet-5 only need 24-70 time steps per image with the accuracy decline not more than 0.1%, while models without our methods require 52-138 time steps, almost 1.92 to 3.21 times slower than us.

Keywords: artificial neural network; deep learning; deep networks; inference-latency; object classification; spiking network conversion; spiking neural network.