NorthEuraLex: a wide-coverage lexical database of Northern Eurasia

Lang Resour Eval. 2020;54(1):273-301. doi: 10.1007/s10579-019-09480-6. Epub 2019 Nov 30.

Abstract

This article describes the first release version of a new lexicostatistical database of Northern Eurasia, which includes Europe as the most well-researched linguistic area. Unlike in other areas of the world, where databases are restricted to covering a small number of concepts as far as possible based on often sparse documentation, good lexical resources providing wide coverage of the lexicon are available even for many smaller languages in our target area. This makes it possible to attain near-completeness for a substantial number of concepts. The resulting database provides a basis for rich benchmarks that can be used to test automated methods which aim to derive new knowledge about language history in underresearched areas.

Keywords: Caucasian languages; Indo-European languages; Lexical database; Northern Eurasia; Siberian languages; Turkic languages; Uralic languages.