Unsupervised Network Quantization via Fixed-Point Factorization

IEEE Trans Neural Netw Learn Syst. 2021 Jun;32(6):2706-2720. doi: 10.1109/TNNLS.2020.3007749. Epub 2021 Jun 2.

Abstract

The deep neural network (DNN) has achieved remarkable performance in a wide range of applications at the cost of huge memory and computational complexity. Fixed-point network quantization emerges as a popular acceleration and compression method but still suffers from huge performance degradation when extremely low-bit quantization is utilized. Moreover, current fixed-point quantization methods rely heavily on supervised retraining using large amounts of the labeled training data, while the labeled data are hard to obtain in the real-world applications. In this article, we propose an efficient framework, namely, fixed-point factorized network (FFN), to turn all weights into ternary values, i.e., {-1, 0, 1}. We highlight that the proposed FFN framework can achieve negligible degradation even without any supervised retraining on the labeled data. Note that the activations can be easily quantized into an 8-bit format; thus, the resulting networks only have low-bit fixed-point additions that are significantly more efficient than 32-bit floating-point multiply-accumulate operations (MACs). Extensive experiments on large-scale ImageNet classification and object detection on MS COCO show that the proposed FFN can achieve about more than 20× compression and remove most of the multiply operations with comparable accuracy. Codes are available on GitHub at https://github.com/wps712/FFN.