Sequence determinants, function, and evolution of CpG islands

Biochem Soc Trans. 2021 Jun 30;49(3):1109-1119. doi: 10.1042/BST20200695.

Abstract

In vertebrates, cytosine-guanine (CpG) dinucleotides are predominantly methylated, with ∼80% of all CpG sites containing 5-methylcytosine (5mC), a repressive mark associated with long-term gene silencing. The exceptions to such a globally hypermethylated state are CpG-rich DNA sequences called CpG islands (CGIs), which are mostly hypomethylated relative to the bulk genome. CGIs overlap promoters from the earliest vertebrates to humans, indicating a concerted evolutionary drive compatible with CGI retention. CGIs are characterised by DNA sequence features that include DNA hypomethylation, elevated CpG and GC content and the presence of transcription factor binding sites. These sequence characteristics are congruous with the recruitment of transcription factors and chromatin modifying enzymes, and transcriptional activation in general. CGIs colocalize with sites of transcriptional initiation in hypermethylated vertebrate genomes, however, a growing body of evidence indicates that CGIs might exert their gene regulatory function in other genomic contexts. In this review, we discuss the diverse regulatory features of CGIs, their functional readout, and the evolutionary implications associated with CGI retention in vertebrates and possibly in invertebrates.

Keywords: CpG islands; DNA methylation; chromatin; orphan CpG islands.

Publication types

  • Review

MeSH terms

  • Animals
  • Binding Sites / genetics
  • Chromatin / genetics
  • Chromatin / metabolism
  • CpG Islands / genetics*
  • DNA Methylation*
  • Gene Expression Regulation*
  • Genome / genetics*
  • Humans
  • Promoter Regions, Genetic / genetics*
  • Transcription Factors / metabolism

Substances

  • Chromatin
  • Transcription Factors