Local ancestry prediction with PyLAE

PeerJ. 2021 Dec 14:9:e12502. doi: 10.7717/peerj.12502. eCollection 2021.

Abstract

Summary: We developed PyLAE, a new tool for determining local ancestry along a genome using whole-genome sequencing data or high-density genotyping experiments. PyLAE can process an arbitrarily large number of ancestral populations (with or without an informative prior). Since PyLAE does not involve estimating many parameters, it can process thousands of genomes within a day. PyLAE can run on phased or unphased genomic data. We have shown how PyLAE can be applied to the identification of differentially enriched pathways between populations. The local ancestry approach results in higher enrichment scores compared to whole-genome approaches. We benchmarked PyLAE using the 1000 Genomes dataset, comparing the aggregated predictions with the global admixture results and the current gold standard program RFMix. Computational efficiency, minimal requirements for data pre-processing, straightforward presentation of results, and ease of installation make PyLAE a valuable tool to study admixed populations.

Availability and implementation: The source code and installation manual are available at https://github.com/smetam/pylae.

Keywords: 1000 Genomes; Bio-origin; Global ancestry; HMM; Local ancestry; Selection signals.

Grants and funding

The authors received no funding for this work.