Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip

Stefania Perri; Cristian Sestito; Fanny Spagnolo; Pasquale Corsonello

doi:10.3390/jimaging6090085

Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip

J Imaging. 2020 Aug 25;6(9):85. doi: 10.3390/jimaging6090085.

Authors

Stefania Perri¹, Cristian Sestito², Fanny Spagnolo², Pasquale Corsonello²

Affiliations

¹ Department of Mechanical, Energy and Management Engineering, University of Calabria, 87036 Rende, Italy.
² Department of Informatics, Modeling, Electronics and System Engineering, University of Calabria, 87036 Rende, Italy.

Abstract

Today, convolutional anddeconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity and the huge memory demand make the design of efficient hardware architectures, as well as their deployment in resource- and power-constrained embedded systems, still quite challenging. This paper presents a novel purpose-designed hardware accelerator to perform 2D deconvolutions. The proposed structure applies a hardware-oriented computational approach that overcomes the issues of traditional deconvolution methods, and it is suitable for being implemented within any virtually system-on-chip based on field-programmable gate array devices. In fact, the novel accelerator is simply scalable to comply with resources available within both high- and low-end devices by adequately scaling the adopted parallelism. As an example, when exploited to accelerate the Deep Convolutional Generative Adversarial Network model, the novel accelerator, running as a standalone unit implemented within the Xilinx Zynq XC7Z020 System-on-Chip (SoC) device, performs up to 72 GOPs. Moreover, it dissipates less than 500mW@200MHz and occupies 5.6%, 4.1%, 17%, and 96%, respectively, of the look-up tables, flip-flops, random access memory, and digital signal processors available on-chip. When accommodated within the same device, the whole embedded system equipped with the novel accelerator performs up to 54 GOPs and dissipates less than 1.8W@150MHz. Thanks to the increased parallelism exploitable, more than 900 GOPs can be executed when the high-end Virtex-7 XC7VX690T device is used as the implementation platform. Moreover, in comparison with state-of-the-art competitors implemented within the Zynq XC7Z045 device, the system proposed here reaches a computational capability up to 20% higher, and saves more than 60% and 80% of power consumption and logic resources requirement, respectively, using 5.7× fewer on-chip memory resources.

Keywords: Field-Programmable Gate Array (FPGA); Generative Adversarial Networks (GANs); heterogeneous embedded systems; image deconvolution.

Grants and funding

POR Calabria FSE/FESR 2014-2020 - International Mobility of PhD students and research grants/type A Researchers - Actions 10.5.6 and 10.5.12/Regione Calabria - Italy