A Low-Latency DNN Accelerator Enabled by DFT-Based Convolution Execution Within Crossbar Arrays

IEEE Trans Neural Netw Learn Syst. 2023 Nov 29:PP. doi: 10.1109/TNNLS.2023.3327122. Online ahead of print.

Abstract

Analog resistive random access memory (RRAM) devices enable parallelized nonvolatile in-memory vector-matrix multiplications for neural networks eliminating the bottlenecks posed by von Neumann architecture. While using RRAMs improves the accelerator performance and enables their deployment at the edge, the high tuning time needed to update the RRAM conductance states adds significant burden and latency to real-time system training. In this article, we develop an in-memory discrete Fourier transform (DFT)-based convolution methodology to reduce system latency and input regeneration. By storing the static DFT/inverse DFT (IDFT) coefficients within the analog arrays, we keep digital computational operations using digital circuits to a minimum. By performing the convolution in reciprocal Fourier space, our approach minimizes connection weight updates, which significantly accelerates both neural network training and interference. Moreover, by minimizing RRAM conductance update frequency, we mitigate the endurance limitations of resistive nonvolatile memories. We show that by leveraging the symmetry and linearity of DFT/IDFTs, we can reduce the power by 1.57 × for convolution over conventional execution. The designed hardware-aware deep neural network (DNN) inference accelerator enhances the peak power efficiency by 28.02 × and area efficiency by 8.7 × over state-of-the-art accelerators. This article paves the way for ultrafast, low-power, compact hardware accelerators.