A new linear combination method of haplogroup distribution central vectors to model population admixtures

Mol Genet Genomics. 2022 May;297(3):889-901. doi: 10.1007/s00438-022-01888-0. Epub 2022 Apr 11.

Abstract

We introduce a novel population genetic approach suitable to model the origin and relationships of populations, using new computation methods analyzing Hg frequency distributions. Hgs were selected into groups which show correlated frequencies in subsets of populations, based on the assumption that correlations were established in ancient separation, migration and admixture processes. Populations are defined with this universal Hg database, then using unsupervised artificial intelligence, central vectors (CVs) are determined from local condensations of the Hg-distribution vectors in the multidimensional point system. Populations are clustered according to their proximity to CVs. We show that CVs can be regarded as approximations of ancient populations and real populations can be modeled as weighted linear combinations of the CVs using a new linear combination algorithm based on a gradient search for the weights. The efficacy of the method is demonstrated by comparing Copper Age populations of the Carpathian Basin to Middle Age ones and modern Hungarians. Our analysis reveals significant population continuity since the Middle Ages, and the presence of a substrate component since the Copper Age.

Keywords: Archaeogenetics; Artificial intelligence; Haplogroups; Self learning.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • DNA, Mitochondrial / genetics
  • Genetics, Population
  • Haplotypes / genetics
  • Hungary
  • Mercury*
  • Phylogeny

Substances

  • DNA, Mitochondrial
  • Mercury