Menzerath's Law in the Syntax of Languages Compared with Random Sentences

Kumiko Tanaka-Ishii

doi:10.3390/e23060661

Menzerath's Law in the Syntax of Languages Compared with Random Sentences

Entropy (Basel). 2021 May 25;23(6):661. doi: 10.3390/e23060661.

Author

Kumiko Tanaka-Ishii¹

Affiliation

¹ Research Center for Advanced Technology, The University of Tokyo, Tokyo 153-8904, Japan.

Abstract

The Menzerath law is considered to show an aspect of the complexity underlying natural language. This law suggests that, for a linguistic unit, the size (y) of a linguistic construct decreases as the number (x) of constructs in the unit increases. This article investigates this property syntactically, with x as the number of constituents modifying the main predicate of a sentence and y as the size of those constituents in terms of the number of words. Following previous articles that demonstrated that the Menzerath property held for dependency corpora, such as in Czech and Ukrainian, this article first examines how well the property applies across languages by using the entire Universal Dependency dataset ver. 2.3, including 76 languages over 129 corpora and the Penn Treebank (PTB). The results show that the law holds reasonably well for x>2. Then, for comparison, the property is investigated with syntactically randomized sentences generated from the PTB. These results show that the property is almost reproducible even from simple random data. Further analysis of the property highlights more detailed characteristics of natural language.

Keywords: Menzerath law; complexity; natural language; syntax.

Grants and funding

20K20492, 21H03493/Japan Society for the Promotion of Science