A2P-MANN: Adaptive Attention Inference Hops Pruned Memory-Augmented Neural Networks

IEEE Trans Neural Netw Learn Syst. 2023 Nov;34(11):8284-8296. doi: 10.1109/TNNLS.2022.3148818. Epub 2023 Oct 27.

Abstract

In this work, to limit the number of required attention inference hops in memory-augmented neural networks, we propose an online adaptive approach called [Formula: see text]-memory-augmented neural network (MANN). By exploiting a small neural network classifier, an adequate number of attention inference hops for the input query are determined. The technique results in the elimination of a large number of unnecessary computations in extracting the correct answer. In addition, to further lower computations in [Formula: see text]-MANN, we suggest pruning weights of the final fully connected (FC) layers. To this end, two pruning approaches, one with negligible accuracy loss and the other with controllable loss on the final accuracy, are developed. The efficacy of the technique is assessed by applying it to two different MANN structures and two question answering (QA) datasets. The analytical assessment reveals, for the two benchmarks, on average, 50% fewer computations compared to the corresponding baseline MANNs at the cost of less than 1% accuracy loss. In addition, when used along with the previously published zero-skipping technique, a computation count reduction of approximately 70% is achieved. Finally, when the proposed approach (without zero skipping) is implemented on the CPU and GPU platforms, on average, a runtime reduction of 43% is achieved.