Factor analysis of ancient population genomic samples

Nat Commun. 2020 Sep 16;11(1):4661. doi: 10.1038/s41467-020-18335-6.

Abstract

The recent years have seen a growing number of studies investigating evolutionary questions using ancient DNA. To address these questions, one of the most frequently-used method is principal component analysis (PCA). When PCA is applied to temporal samples, the sample dates are, however, ignored during analysis, leading to imperfect representations of samples in PC plots. Here, we present a factor analysis (FA) method in which individual scores are corrected for the effect of allele frequency drift over time. We obtained exact solutions for the estimates of corrected factors, and we provided a fast algorithm for their computation. Using computer simulations and ancient European samples, we compared geometric representations obtained from FA with PCA and with ancestry estimation programs. In admixture analyses, FA estimates agreed with tree-based statistics, and they were more accurate than those obtained from PCA projections and from ancestry estimation programs. A great advantage of FA over existing approaches is to improve descriptive analyses of ancient DNA samples without requiring inclusion of outgroup or present-day samples.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • DNA, Ancient / analysis*
  • England
  • Europe
  • Factor Analysis, Statistical*
  • Gene Frequency
  • Genetic Drift
  • Genetics, Population / statistics & numerical data
  • Genome, Human*
  • Humans
  • Metagenomics / statistics & numerical data*
  • Models, Genetic
  • Principal Component Analysis

Substances

  • DNA, Ancient