Exome variant discrepancies due to reference-genome differences

Am J Hum Genet. 2021 Jul 1;108(7):1239-1250. doi: 10.1016/j.ajhg.2021.05.011. Epub 2021 Jun 14.

Abstract

Despite release of the GRCh38 human reference genome more than seven years ago, GRCh37 remains more widely used by most research and clinical laboratories. To date, no study has quantified the impact of utilizing different reference assemblies for the identification of variants associated with rare and common diseases from large-scale exome-sequencing data. By calling variants on both the GRCh37 and GRCh38 references, we identified single-nucleotide variants (SNVs) and insertion-deletions (indels) in 1,572 exomes from participants with Mendelian diseases and their family members. We found that a total of 1.5% of SNVs and 2.0% of indels were discordant when different references were used. Notably, 76.6% of the discordant variants were clustered within discrete discordant reference patches (DISCREPs) comprising only 0.9% of loci targeted by exome sequencing. These DISCREPs were enriched for genomic elements including segmental duplications, fix patch sequences, and loci known to contain alternate haplotypes. We identified 206 genes significantly enriched for discordant variants, most of which were in DISCREPs and caused by multi-mapped reads on the reference assembly that lacked the variant call. Among these 206 genes, eight are implicated in known Mendelian diseases and 53 are associated with common phenotypes from genome-wide association studies. In addition, variant interpretations could also be influenced by the reference after lifting-over variant loci to another assembly. Overall, we identified genes and genomic loci affected by reference assembly choice, including genes associated with Mendelian disorders and complex human diseases that require careful evaluation in both research and clinical applications.

Keywords: GRCh37; GRCh38; Human Genome Reference; clinical genome sequencing; exome sequencing; hg19.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cohort Studies
  • Exome*
  • Genetic Diseases, Inborn / genetics
  • Genome, Human*
  • Humans
  • Polymorphism, Single Nucleotide*
  • Reference Values