Mapping Neural Networks to FPGA-Based IoT Devices for Ultra-Low Latency Processing

Maciej Wielgosz; Michał Karwatowski

doi:10.3390/s19132981

Mapping Neural Networks to FPGA-Based IoT Devices for Ultra-Low Latency Processing

Sensors (Basel). 2019 Jul 5;19(13):2981. doi: 10.3390/s19132981.

Authors

Maciej Wielgosz^{1

2}, Michał Karwatowski^{3

4}

Affiliations

¹ Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology, al. Adama Mickiewicza 30, 30-059 Cracow, Poland. wielgosz@agh.edu.pl.
² Academic Computer Centre CYFRONET AGH, ul. Nawojki 11, 30-072 Cracow, Poland. wielgosz@agh.edu.pl.
³ Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology, al. Adama Mickiewicza 30, 30-059 Cracow, Poland.
⁴ Academic Computer Centre CYFRONET AGH, ul. Nawojki 11, 30-072 Cracow, Poland.

Abstract

Internet of things (IoT) infrastructure, fast access to knowledge becomes critical. In some application domains, such as robotics, autonomous driving, predictive maintenance, and anomaly detection, the response time of the system is more critical to ensure Quality of Service than the quality of the answer. In this paper, we propose a methodology, a set of predefined steps to be taken in order to map the models to hardware, especially field programmable gate arrays (FPGAs), with the main focus on latency reduction. Multi-objective covariance matrix adaptation evolution strategy (MO-CMA-ES) was employed along with custom scores for sparsity, bit-width of the representation and quality of the model. Furthermore, we created a framework which enables mapping of neural models to FPGAs. The proposed solution is validated using three case studies and Xilinx Zynq UltraScale+ MPSoC 285 XCZU15EG as a platform. The results show a compression ratio for quantization and pruning in different scenarios with and without retraining procedures. Using our publicly available framework, we achieved 210 ns of latency for a single processing step for a model composed of two long short-term memory (LSTM) and a single dense layer.

Keywords: Deep Learning; FPGA; Internet of Things (IoT); Neural Networks; Recurrent Neural Network (RNN).