SCA: Search-Based Computing Hardware Architecture with Precision Scalable and Computation Reconfigurable Scheme

Liang Chang; Xin Zhao; Jun Zhou

doi:10.3390/s22218545

SCA: Search-Based Computing Hardware Architecture with Precision Scalable and Computation Reconfigurable Scheme

Sensors (Basel). 2022 Nov 6;22(21):8545. doi: 10.3390/s22218545.

Authors

Liang Chang¹, Xin Zhao¹, Jun Zhou¹

Affiliation

¹ School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.

Abstract

Deep neural networks have been deployed in various hardware accelerators, such as graph process units (GPUs), field-program gate arrays (FPGAs), and application specific integrated circuit (ASIC) chips. Normally, a huge amount of computation is required in the inference process, creating significant logic resource overheads. In addition, frequent data accessions between off-chip memory and hardware accelerators create bottlenecks, leading to decline in hardware efficiency. Many solutions have been proposed to reduce hardware overhead and data movements. For example, specific lookup-table (LUT)-based hardware architecture can be used to mitigate computing operation demands. However, typical LUT-based accelerators are affected by computational precision limitation and poor scalability issues. In this paper, we propose a search-based computing scheme based on an LUT solution, which improves computation efficiency by replacing traditional multiplication with a search operation. In addition, the proposed scheme supports different precision multiple-bit widths to meet the needs of different DNN-based applications. We design a reconfigurable computing strategy, which can efficiently adapt to the convolution of different kernel sizes to improve hardware scalability. We implement a search-based architecture, namely SCA, which adopts an on-chip storage mechanism, thus greatly reducing interactions with off-chip memory and alleviating bandwidth pressure. Based on experimental evaluation, the proposed SCA architecture can achieve 92%, 96% and 98% computational utilization for computational precision of 4 bit, 8 bit and 16 bit, respectively. Compared with state-of-the-art LUT-based architecture, the efficiency can be improved four-fold.

Keywords: computational utilization; look-up table; precision scalable; reconfigurable; search-based computing.

MeSH terms

Algorithms*
Computers
Neural Networks, Computer*

Grants and funding

The authors would like to thank the anonymous referees for their valuable comments and helpful suggestions. This work was supported by the National Safety Academic Fund under Grant No. U2030204 and National Natural Science Foundation of China under Grant No. 62104025.