Sentence-level complexity in Russian: An evaluation of BERT and graph neural networks

Vladimir Vladimirovich Ivanov

doi:10.3389/frai.2022.1008411

Sentence-level complexity in Russian: An evaluation of BERT and graph neural networks

Front Artif Intell. 2022 Dec 8:5:1008411. doi: 10.3389/frai.2022.1008411. eCollection 2022.

Author

Vladimir Vladimirovich Ivanov^{1

2}

Affiliations

¹ Faculty of Computer Science and Engineering, Innopolis University, Innopolis, Russia.
² Institute of Philology and Intercultural Communication, Kazan Federal University, Kazan, Russia.

Abstract

Introduction: Sentence-level complexity evaluation (SCE) can be formulated as assigning a given sentence a complexity score: either as a category, or a single value. SCE task can be treated as an intermediate step for text complexity prediction, text simplification, lexical complexity prediction, etc. What is more, robust prediction of a single sentence complexity needs much shorter text fragments than the ones typically required to robustly evaluate text complexity. Morphosyntactic and lexical features have proved their vital role as predictors in the state-of-the-art deep neural models for sentence categorization. However, a common issue is the interpretability of deep neural network results.

Methods: This paper presents testing and comparing several approaches to predict both absolute and relative sentence complexity in Russian. The evaluation involves Russian BERT, Transformer, SVM with features from sentence embeddings, and a graph neural network. Such a comparison is done for the first time for the Russian language.

Results and discussion: Pre-trained language models outperform graph neural networks, that incorporate the syntactical dependency tree of a sentence. The graph neural networks perform better than Transformer and SVM classifiers that employ sentence embeddings. Predictions of the proposed graph neural network architecture can be easily explained.

Keywords: BERT; Russian language; graph neural networks; sentence embeddings; sentence-level complexity; text complexity.