Improving imputation quality in Samoans through the integration of population-specific sequences into existing reference panels

medRxiv [Preprint]. 2023 Oct 31:2023.10.31.23297835. doi: 10.1101/2023.10.31.23297835.

Abstract

Genotype imputation is fundamental to association studies, and yet even gold standard panels like TOPMed are limited in the populations for which they yield good imputation. Specifically, Pacific Islanders are poorly represented in extant panels. To address this, we constructed an imputation reference panel using 1,285 Samoan individuals with whole-genome sequencing, combined with 1000 Genomes (1000G) samples, to create a reference panel that better represents Pacific Islander, specifically Samoan, genetic variation. We compared this panel to 1000G and TOPMed panels based on imputed variants using genotyping array data for 1,834 Samoan participants who were not part of the panels. The 1000G + 1285 Samoan panel yielded up to 2.25-2.76 times more well-imputed (r 2 ≥ 0.80) variants than TOPMed and 1000G. There was improved imputation accuracy across the minor allele frequency (MAF) spectrum, although it was more pronounced for variants with 0.01 ≤ MAF ≤ 0.05. Imputation accuracy (r 2 ) was greater for population-specific variants (high fixation index, F ST ) and those from larger haplotypes (high LD score). The gain in imputation accuracy over TOPMed was largest for small haplotypes (low LD score), reflecting the Samoan panel's ability to capture population-specific variation not well tagged by other panels. We also augmented the 1000G reference panel with varying numbers of Samoan samples and found that panels with 48 or more Samoans included outperformed TOPMed for all variants with MAF ≥ 0.001. This study identifies variants with improved imputation using population-specific reference panels and provides a framework for constructing other population-specific reference panels.

Publication types

  • Preprint