The genome-wide landscape of C:G > T:A polymorphism at the CpG contexts in the human population

BMC Genomics. 2020 Mar 30;21(1):270. doi: 10.1186/s12864-020-6674-1.

Abstract

Background: The C:G > T:A substitution at the CpG dinucleotide contexts is the most frequent substitution type in genome evolution. The mutational process is obviously ongoing in the human germline; however, its impact on common and rare genomic polymorphisms has not been comprehensively investigated yet. Here we observed the landscape and dynamics of C:G > T:A substitutions from population-scale human genome sequencing datasets including ~ 4300 whole-genomes from the 1000 Genomes and the pan-cancer analysis of whole genomes (PCAWG) Project and ~ 60,000 whole-exomes from the Exome Aggregation Consortium (ExAC) database.

Results: Of the 28,084,558 CpG sites in the human reference genome, 26.0% show C:G > T:A substitution in the dataset. Remarkably, CpGs in CpG islands (CGIs) have a much lower frequency of such mutations (5.6%). Interestingly, the mutation frequency of CGIs is not uniform with a significantly higher C:G > T:A substitution rate for intragenic CGIs compared to other types. For non-CGI CpGs, the mutation rate was positively correlated with the distance from the nearest CGI up to 2 kb. Finally, we found the impact of negative selection for coding CpG mutations resulting in amino acid change.

Conclusions: This study provides the first unbiased rate of C:G > T:A substitution at the CpG dinucleotide contexts, using population-scale human genome sequencing data. Our findings provide insights into the dynamics of the mutation acquisition in the human genome.

Keywords: CpG; CpG island; Methylation; Single nucleotide polymorphism; Transition.

MeSH terms

  • CpG Islands / genetics*
  • DNA Methylation / genetics
  • DNA Methylation / physiology
  • Humans
  • Mutation / genetics
  • Polymorphism, Single Nucleotide / genetics