Driving mosaicism: somatic variants in reference population databases and effect on variant interpretation in rare genetic disease

Hum Genomics. 2021 Dec 14;15(1):71. doi: 10.1186/s40246-021-00371-y.

Abstract

Background: Genetic variation databases provide invaluable information on the presence and frequency of genetic variants in the 'untargeted' human population, aggregated with the primary goal to facilitate the interpretation of clinically important variants. The presence of somatic variants in such databases can affect variant assessment in undiagnosed rare disease (RD) patients. Previously, the impact of somatic mosaicism was only considered in relation to two Mendelian disease-associated genes. Here, we expand the analyses to identify additional mosaicism-prone genes in blood-derived reference population databases.

Results: To identify additional mosaicism-prone genes relevant to RDs, we focused on known/previously established ClinVar pathogenic and likely pathogenic single-nucleotide variants, residing in genes associated with early onset, severe autosomal dominant diseases. We asked whether any of these variants are present in a higher-than-expected frequency in the reference population databases and whether there is evidence of somatic origin (i.e., allelic imbalance) rather than germline heterozygosity (~ half of the reads supporting alternative allele). The mosaicism-prone genes identified were further categorized according to the processes they are involved in. Beyond the previously reported ASXL1 and DNMT3A, we identified 7 additional autosomal dominant RD-associated genes with known pathogenic single-nucleotide variants present in the reference population databases and good evidence of allelic imbalance: BRAF, CBL, FGFR3, IDH2, KRAS, PTPN11 and SETBP1. From this group of 9 genes, the majority (n = 7) was important for hematopoiesis. In addition, 4 of these genes were involved in cell proliferation. Further assessment of the known 156 hematopoietic genes led to identification of 48 genes (21 not yet associated with RDs) with at least some evidence of mosaicism detectable in reference population databases.

Conclusions: These results stress the importance of considering genes involved in hematopoiesis and cell proliferation when interpreting the presence and frequency of genetic variants in blood-derived reference population databases, both public and private. This is especially important when considering new variants of uncertain significance in known hematopoietic/cell proliferation RD genes and future novel gene-disease associations involving this class of genes.

Keywords: Blood-derived reference population databases; Cell proliferation; Clonal hematopoiesis of indeterminate potential (CHIP); Genome sequencing; Hematopoietic genes; Rare diseases.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Databases, Genetic
  • Humans
  • Mosaicism*
  • Rare Diseases* / genetics