K-mer Content Changes with Node Degree in Promoter-Enhancer Network of Mouse ES Cells

Int J Mol Sci. 2021 Jul 28;22(15):8067. doi: 10.3390/ijms22158067.

Abstract

Maps of Hi-C contacts between promoters and enhancers can be analyzed as networks, with cis-regulatory regions as nodes and their interactions as edges. We checked if in the published promoter-enhancer network of mouse embryonic stem (ES) cells the differences in the node type (promoter or enhancer) and the node degree (number of regions interacting with a given promoter or enhancer) are reflected by sequence composition or sequence similarity of the interacting nodes. We used counts of all k-mers (k = 4) to analyze the sequence composition and the Euclidean distance between the k-mer count vectors (k-mer distance) as the measure of sequence (dis)similarity. The results we obtained with 4-mers are interpretable in terms of dinucleotides. Promoters are GC-rich as compared to enhancers, which is known. Enhancers are enriched in scaffold/matrix attachment regions (S/MARs) patterns and depleted of CpGs. Furthermore, we show that promoters are more similar to their interacting enhancers than vice-versa. Most notably, in both promoters and enhancers, the GC content and the CpG count increase with the node degree. As a consequence, enhancers of higher node degree become more similar to promoters, whereas higher degree promoters become less similar to enhancers. We confirmed the key results also for human keratinocytes.

Keywords: 4-mer; CpG; Hi-C; S/MAR; dinucleotide; embryonic stem cell.

MeSH terms

  • Animals
  • Base Composition
  • Computational Biology
  • CpG Islands
  • Enhancer Elements, Genetic*
  • Gene Regulatory Networks*
  • Humans
  • Keratinocytes / metabolism
  • Mice
  • Models, Genetic*
  • Mouse Embryonic Stem Cells / metabolism*