Sparse factorization of square matrices with application to neural attention modeling

Ruslan Khalitov; Tong Yu; Lei Cheng; Zhirong Yang

doi:10.1016/j.neunet.2022.04.014

Sparse factorization of square matrices with application to neural attention modeling

Neural Netw. 2022 Aug:152:160-168. doi: 10.1016/j.neunet.2022.04.014. Epub 2022 Apr 22.

Authors

Ruslan Khalitov¹, Tong Yu¹, Lei Cheng¹, Zhirong Yang²

Affiliations

¹ Norwegian University of Science and Technology, Norway.
² Norwegian University of Science and Technology, Norway. Electronic address: zhirong.yang@ntnu.no.

PMID: 35525164
DOI: 10.1016/j.neunet.2022.04.014

Abstract

Square matrices appear in many machine learning problems and models. Optimization over a large square matrix is expensive in memory and in time. Therefore an economic approximation is needed. Conventional approximation approaches factorize the square matrix into a number matrices of much lower ranks. However, the low-rank constraint is a performance bottleneck if the approximated matrix is intrinsically high-rank or close to full rank. In this paper, we propose to approximate a large square matrix with a product of sparse full-rank matrices. In the approximation, our method needs only N(logN)² non-zero numbers for an N×N full matrix. Our new method is especially useful for scalable neural attention modeling. Different from the conventional scaled dot-product attention methods, we train neural networks to map input data to the non-zero entries of the factorizing matrices. The sparse factorization method is tested for various square matrices, and the experimental results demonstrate that our method gives a better approximation when the approximated matrix is sparse and high-rank. As an attention module, our new method defeats Transformer and its several variants for long sequences in synthetic data sets and in the Long Range Arena benchmarks. Our code is publicly available².

Keywords: Attention modeling; Matrix factorization; Neural networks; Sparse.

MeSH terms

Algorithms*
Attention
Machine Learning
Neural Networks, Computer*