Testing methods of linguistic homeland detection using synthetic data

Søren Wichmann; Taraka Rama

doi:10.1098/rstb.2020.0202

Testing methods of linguistic homeland detection using synthetic data

Philos Trans R Soc Lond B Biol Sci. 2021 May 10;376(1824):20200202. doi: 10.1098/rstb.2020.0202. Epub 2021 Mar 22.

Authors

Søren Wichmann^{1

2}, Taraka Rama³

Affiliations

¹ Leiden University Centre for Linguistics, Leiden University, Postbus 9515, Leiden 2300 RA, The Netherlands.
² Laboratory for Quantitative Linguistics, Kazan Federal University, Kremlevskaya Street 18, Kazan 420000, Russia.
³ Department of Linguistics, University of North Texas, Discovery Park Room B201, 3940 N Elm St., Suite B201, Denton, TX 76207, USA.

Abstract

Two families of quantitative methods have been used to infer geographical homelands of language families: Bayesian phylogeography and the 'diversity method'. Bayesian methods model how populations may have moved using a phylogenetic tree as a backbone, while the diversity method assumes that the geographical area where linguistic diversity is highest likely corresponds to the homeland. No systematic tests of the performances of the different methods in a linguistic context have so far been published. Here, we carry out performance testing by simulating language families, including branching structures and word lists, along with speaker populations moving in space. We test six different methods: two versions of BayesTraits; the relaxed random walk model of BEAST 2; our own RevBayes implementations of a fixed rate and a variable rates random walk model; and the diversity method. As a result of the tests, we propose a hierarchy of performance of the different methods. Factors such as geographical idiosyncrasies, incomplete sampling, tree imbalance and small family sizes all have a negative impact on performance, but mostly across the board, the performance hierarchy generally being impervious to such factors. This article is part of the theme issue 'Reconstructing prehistoric languages'.

Keywords: Bayesian phylogeography; historical linguistics; homelands; migration; phylogenetics.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Bayes Theorem
Cultural Evolution*
Human Migration*
Humans
Language*
Linguistics / methods*
Phylogeny
Phylogeography

Associated data

Dryad/10.5061/dryad.4gg07