LEI: A Novel Allele Frequency-Based Feature Selection Method for Multi-ancestry Admixed Populations

Sci Rep. 2019 Jul 31;9(1):11103. doi: 10.1038/s41598-019-47012-y.

Abstract

Next-generation sequencing technologies now make it possible to sequence and genotype hundreds of thousands of genetic markers across the human genome. Selection of informative markers for the comprehensive characterization of individual genomic makeup using a high dimensional genomics dataset has become a common practice in evolutionary biology and human genetics. Although several feature selection approaches exist to determine the ancestry proportion in two-way admixed populations including African Americans, there are limited statistical tools developed for the feature selection approaches in three-way admixed populations (including Latino populations). Herein, we present a new likelihood-based feature selection method called Lancaster Estimator of Independence (LEI) that utilizes allele frequency information to prioritize the most informative features useful to determine ancestry proportion from multiple ancestral populations in admixed individuals. The ability of LEI to leverage summary-level statistics from allele frequency data, thereby avoiding the many restrictions (and big data issues) that can accompany access to individual-level genotype data, is appealing to minimize the computation and time-consuming ancestry inference in an admixed population. We compared our allele-frequency based approach with genotype-based approach in estimating admixed proportions in three-way admixed population scenarios. Our results showed ancestry estimates using the top-ranked features from LEI were comparable with the estimates using features from genotype-based methods in three-way admixed population. We provide an easy-to-use R code to assist researchers in using the LEI tool to develop allele frequency-based informative features to conduct admixture mapping studies from mixed samples of multiple ancestry origin.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Gene Frequency / genetics*
  • Genetic Markers / genetics
  • Genetics, Population / methods
  • Genome, Human / genetics
  • Genomics / methods
  • Genotype
  • Humans
  • Likelihood Functions
  • Software

Substances

  • Genetic Markers