Linguistic Behavior of Well-Defined Strings in the Non-Coding Human Genome

Nonlinear Dynamics Psychol Life Sci. 2022 Jan;26(1):1-19.

Abstract

In this article we do a top-down analysis of the non-protein-coding human genome using well-defined parameters, resulting in what we call ?-strings. We show that there are altogether 45,371,328 different ?-strings in the human non-protein-coding genome. We explore statistical properties of the y-strings and demonstrate that they have many characteristics in common with human words. We indicate how they are 'packed' in the chromosomes and that each chromosome has its own specific y-dictionary. We also outline our future work exploring the linguistic features of y-strings and y-text using methods developed to study human, natural language.

MeSH terms

  • Genome, Human*
  • Humans
  • Language*
  • Linguistics