Inference of Population Structure from Time-Series Genotype Data

Am J Hum Genet. 2019 Aug 1;105(2):317-333. doi: 10.1016/j.ajhg.2019.06.002. Epub 2019 Jun 27.

Abstract

Sequencing ancient DNA can offer direct probing of population history. Yet, such data are commonly analyzed with standard tools that assume DNA samples are all contemporary. We present DyStruct, a model and inference algorithm for inferring shared ancestry from temporally sampled genotype data. DyStruct explicitly incorporates temporal dynamics by modeling individuals as mixtures of unobserved populations whose allele frequencies drift over time. We develop an efficient inference algorithm for our model using stochastic variational inference. On simulated data, we show that DyStruct outperforms the current state of the art when individuals are sampled over time. Using a dataset of 296 modern and 80 ancient samples, we demonstrate DyStruct is able to capture a well-supported admixture event of steppe ancestry into modern Europe. We further apply DyStruct to a genome-wide dataset of 2,067 modern and 262 ancient samples used to study the origin of farming in the Near East. We show that DyStruct provides new insight into population history when compared with alternate approaches, within feasible run time.

Keywords: ancient DNA; population structure; time-series; variational inference.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Europe
  • Gene Frequency
  • Genetic Predisposition to Disease
  • Genetic Variation*
  • Genetics, Population*
  • Genome-Wide Association Study
  • Genotype
  • Humans
  • Middle East
  • Models, Genetic*
  • Models, Statistical*
  • Population Groups / genetics*
  • Time Factors