Exploring Dimensionality Reduction Techniques in Multilingual Transformers

Álvaro Huertas-García; Alejandro Martín; Javier Huertas-Tato; David Camacho

doi:10.1007/s12559-022-10066-8

Exploring Dimensionality Reduction Techniques in Multilingual Transformers

Cognit Comput. 2023;15(2):590-612. doi: 10.1007/s12559-022-10066-8. Epub 2022 Oct 29.

Authors

Álvaro Huertas-García¹, Alejandro Martín¹, Javier Huertas-Tato¹, David Camacho¹

Affiliation

¹ Departamento de Sistemas Informáticos, Universidad Politécnica de Madrid, Madrid, Spain.

Abstract

In scientific literature and industry, semantic and context-aware Natural Language Processing-based solutions have been gaining importance in recent years. The possibilities and performance shown by these models when dealing with complex Human Language Understanding tasks are unquestionable, from conversational agents to the fight against disinformation in social networks. In addition, considerable attention is also being paid to developing multilingual models to tackle the language bottleneck. An increase in size has accompanied the growing need to provide more complex models implementing all these features without being conservative in the number of dimensions required. This paper aims to provide a comprehensive account of the impact of a wide variety of dimensional reduction techniques on the performance of different state-of-the-art multilingual siamese transformers, including unsupervised dimensional reduction techniques such as linear and nonlinear feature extraction, feature selection, and manifold techniques. In order to evaluate the effects of these techniques, we considered the multilingual extended version of Semantic Textual Similarity Benchmark (mSTSb) and two different baseline approaches, one using the embeddings from the pre-trained version of five models and another using their fine-tuned STS version. The results evidence that it is possible to achieve an average reduction of $91.58 % \pm 2.59 %$ in the number of dimensions of embeddings from pre-trained models requiring a fitting time $96.68 % \pm 0.68 %$ faster than the fine-tuning process. Besides, we achieve $54.65 % \pm 32.20 %$ dimensionality reduction in embeddings from fine-tuned models. The results of this study will significantly contribute to the understanding of how different tuning approaches affect performance on semantic-aware tasks and how dimensional reduction techniques deal with the high-dimensional embeddings computed for the STS task and their potential for other highly demanding NLP tasks.

Keywords: Dimensionality reduction; Language models; Multilingual transformers; Natural language processing; Semantic textual similarity.