Extensive set of African ancestry-informative markers (AIMs) to study ancestry and population health

Front Genet. 2023 Feb 23:14:1061781. doi: 10.3389/fgene.2023.1061781. eCollection 2023.

Abstract

Introduction: Human populations are often highly structured due to differences in genetic ancestry among groups, posing difficulties in associating genes with diseases. Ancestry-informative markers (AIMs) aid in the detection of population stratification and provide an alternative approach to map population-specific alleles to disease. Here, we identify and characterize a novel set of African AIMs that separate populations of African ancestry from other global populations including those of European ancestry. Methods: Using data from the 1000 Genomes Project, highly informative SNP markers from five African subpopulations were selected based on estimates of informativeness (In) and compared against the European population to generate a final set of 46,737 African ancestry-informative markers (AIMs). The AIMs identified were validated using an independent set and functionally annotated using tools like SIFT, PolyPhen. They were also investigated for representation of commonly used SNP arrays. Results: This set of African AIMs effectively separates populations of African ancestry from other global populations and further identifies substructure between populations of African ancestry. When a subset of these AIMs was studied in an independent dataset, they differentiated people who self-identify as African American or Black from those who identify their ancestry as primarily European. Most of the AIMs were found to be in their intergenic and intronic regions with only 0.6% in the coding regions of the genome. Most of the commonly used SNP array investigated contained less than 10% of the AIMs. Discussion: While several functional annotations of both coding and non-coding African AIMs are supported by the literature and linked these high-frequency African alleles to diseases in African populations, more effort is needed to map genes to diseases in these genetically diverse subpopulations. The relative dearth of these African AIMs on current genotyping platforms (the array with the highest fraction, llumina's Omni 5, harbors less than a quarter of AIMs), further demonstrates a greater need to better represent historically understudied populations.

Keywords: 1000 genomes project; African ancestry; aims; ancestry and health; population structure.

Grants and funding

This work was supported by the American Cancer Society (RSG-14-033-01-CPPB to CR) and the National Cancer Institute (CA006927). MR is partially supported by the William J. Avery Postdoctoral Research Fellowship (Fox Chase Cancer Center). SB, RJK, and CR are partially supported by the Chan Zuckerberg Initiative. This work is supported in part by 5P30CA006927 and by the TUFCCC/HC Regional Comprehensive Cancer Health Disparity Partnership, Award Number U54 CA221704 from the National Cancer Institute of National Institutes of Health (NCI/NIH). Publication of this article was funded in part by the Temple University Libraries Open Access Publishing Fund. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NCI/NIH.