QTTNet: Quantized tensor train neural networks for 3D object and video recognition

Donghyun Lee; Dingheng Wang; Yukuan Yang; Lei Deng; Guangshe Zhao; Guoqi Li

doi:10.1016/j.neunet.2021.05.034

QTTNet: Quantized tensor train neural networks for 3D object and video recognition

Neural Netw. 2021 Sep:141:420-432. doi: 10.1016/j.neunet.2021.05.034. Epub 2021 Jun 5.

Authors

Donghyun Lee¹, Dingheng Wang², Yukuan Yang³, Lei Deng⁴, Guangshe Zhao⁵, Guoqi Li⁶

Affiliations

¹ Department of Precision Instrumentation, Center for Brain Inspired Computing Research and Beijing Innovation Center for Future Chip, Tsinghua University, Beijing 100084, China. Electronic address: lidx18@mails.tsinghua.edu.cn.
² School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China. Electronic address: wangdai11@stu.xjtu.edu.cn.
³ Department of Precision Instrumentation, Center for Brain Inspired Computing Research and Beijing Innovation Center for Future Chip, Tsinghua University, Beijing 100084, China. Electronic address: yangyk17@mails.tsinghua.edu.cn.
⁴ University of California, Santa Barbara, CA 93106, USA. Electronic address: leideng@ucsb.edu.
⁵ School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China. Electronic address: zhaogs@mail.xjtu.edu.cn.
⁶ Department of Precision Instrumentation, Center for Brain Inspired Computing Research and Beijing Innovation Center for Future Chip, Tsinghua University, Beijing 100084, China. Electronic address: liguoqi@mail.tsinghua.edu.cn.

PMID: 34146969
DOI: 10.1016/j.neunet.2021.05.034

Abstract

Relying on the rapidly increasing capacity of computing clusters and hardware, convolutional neural networks (CNNs) have been successfully applied in various fields and achieved state-of-the-art results. Despite these exciting developments, the huge memory cost is still involved in training and inferring a large-scale CNN model and makes it hard to be widely used in resource-limited portable devices. To address this problem, we establish a training framework for three-dimensional convolutional neural networks (3DCNNs) named QTTNet that combines tensor train (TT) decomposition and data quantization together for further shrinking the model size and decreasing the memory and time cost. Through this framework, we can fully explore the superiority of TT in reducing the number of trainable parameters and the advantage of quantization in decreasing the bit-width of data, particularly compressing 3DCNN model greatly with little accuracy degradation. In addition, due to the low bit quantization to all parameters during the inference process including TT-cores, activations, and batch normalizations, the proposed method naturally takes advantage in memory and time cost. Experimental results of compressing 3DCNNs for 3D object and video recognition on ModelNet40, UCF11, and UCF50 datasets verify the effectiveness of the proposed method. The best compression ratio we have obtained is up to nearly 180× with competitive performance compared with other state-of-the-art researches. Moreover, the total bytes of our QTTNet models on ModelNet40 and UCF11 datasets can be 1000× lower than some typical practices such as MVCNN.

Keywords: 3DCNN; 8 bit inference; Neural network compression; Quantization; Tensor train decomposition.

MeSH terms

Data Compression
Imaging, Three-Dimensional
Neural Networks, Computer*