A model for representing the semantics of MWEs: From lexical semantics to the semantic annotation of complex predicates

Voula Giouli

doi:10.3389/frai.2023.802218

A model for representing the semantics of MWEs: From lexical semantics to the semantic annotation of complex predicates

Front Artif Intell. 2023 Mar 23:6:802218. doi: 10.3389/frai.2023.802218. eCollection 2023.

Author

Voula Giouli¹

Affiliation

¹ ATHENA Research Centre, Institute for Language and Speech Processing, Maroussi, Greece.

Abstract

Multiword expressions (MWEs) are sequences of words that pose a challenge to the computational processing of human languages due to their idiosyncrasies and the mismatch between their phrasal structure and their semantics. These idiosyncrasies are of lexical, morphosyntactic and semantic 11 nature, namely: non-compositionality, i.e., the meaning of the expression cannot be computed from the meanings of its constituents; discontinuity, i.e., alien elements may intervene; non-13 substitutability, i.e., at least one of the expression constituents is lexicalized and therefore, does not enter in alternations at the paradigmatic axis; and non-modifiability, in that they enter in syntactically 15 rigid structures, posing further constraints over modification, transformations, etc. The paper presents a model for representing MWEs at the level of semantics by taking into account all these inherent idiosyncrasies. The model assumes the form of a linguistic ontology and is applied to Greek verbal multi-word expressions (VMWEs); moreover, the semantics of the lexical entries under scrutiny is also represented via the semantics of their arguments based on corpus evidence. In this regard, modeling the semantics of VMWEs is placed in the lexicon-corpus interface.

Keywords: Semantic Role Labeling (SRL); lexical semantics; linguistic ontology; semantic relations; semantic representation; verbal MWEs.

Grants and funding

This research leading to the results presented in this article was partially funded by the project “Computational Science and Technologies: Data, Content and Interaction. Language Technologies for Content and Interaction Analysis” (MIS 5002437), which was co-financed by Greece and the EU (Partnership Agreement 2014-2020, Operational Program “Competitiveness Entrepreneurship Innovation” 2017 - 2019). The lexicon API and the annotations were also funded by the project “AIO_ILSP: Lexical Resource Infrastructures”, which was financed by the Institute for Language and Speech Processing, ATHENA Research Centre.