Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing

HGG Adv. 2022 Nov 11;4(1):100159. doi: 10.1016/j.xhgg.2022.100159. eCollection 2023 Jan 12.

Abstract

Over the last two decades, the human reference genome has undergone multiple updates as we complete a linear representation of our genome. Two versions of human references are currently used in the biomedical literature, GRCh37/hg19 and GRCh38. Conversions between these versions are critical for quality control, imputation, and association analysis. In the present study, we show that single-nucleotide variants (SNVs) in regions inverted between different builds of the reference genome are often mishandled bioinformatically. Depending on the array type, SNVs are found in approximately 2-5 Mb of the genome that are inverted between reference builds. Coordinate conversions of these variants are mishandled by both the TOPMed imputation server as well as routine in-house quality control pipelines, leading to underrecognized downstream analytical consequences. Specifically, we observe that undetected allelic conversion errors for palindromic (i.e., A/T or C/G) variants in these inverted regions would destabilize the local haplotype structure, leading to loss of imputation accuracy and power in association analyses. Though only a small proportion of the genome is affected, these regions include important disease susceptibility variants that would be affected. For example, the p value of a known locus associated with prostate cancer on chromosome 10 (chr10) would drop from 2.86 × 10-7 to 0.0011 in a case-control analysis of 20,286 Africans and African Americans (10,643 cases and 9,643 controls). We devise a straight-forward heuristic based on the popular tool, liftOver, that can easily detect and correct these variants in the inverted regions between genome builds to locally improve imputation accuracy.

Keywords: bioinformatics; genetic associations; genome build; imputation; reference genome.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Black or African American
  • Genome, Human / genetics
  • Genome-Wide Association Study*
  • Genomics*
  • Haplotypes / genetics
  • Humans
  • Male