Prediction of CpG Islands as an Intrinsic Clustering Property Found in Many Eukaryotic DNA Sequences and Its Relation to DNA Methylation

Methods Mol Biol. 2018:1766:31-47. doi: 10.1007/978-1-4939-7768-0_3.

Abstract

The promoter region of around 70% of all genes in the human genome is overlapped by a CpG island (CGI). CGIs have known functions in the transcription initiation and outstanding compositional features like high G+C content and CpG ratios when compared to the bulk DNA. We have shown before that CGIs manifest as clusters of CpGs in mammalian genomes and can therefore be detected using clustering methods. These techniques have several advantages over sliding window approaches which apply compositional properties as thresholds. In this protocol we show how to determine local (CpG islands) and global (distance distribution) clustering properties of CG dinucleotides and how to generalize this analysis to any k-mer or combinations of it. In addition, we illustrate how to easily cross the output of a CpG island prediction algorithm with our methylation database to detect differentially methylated CGIs. The analysis is given in a step-by-step protocol and all necessary programs are implemented into a virtual machine or, alternatively, the software can be downloaded and easily installed.

Keywords: Clustering; CpG islands; DNA methylation; DNA words; Virtual machine.

MeSH terms

  • Animals
  • Base Composition
  • Base Sequence
  • CpG Islands / genetics*
  • DNA / chemistry
  • DNA / genetics
  • DNA / metabolism
  • DNA Methylation*
  • Genome, Human / genetics*
  • Humans
  • Promoter Regions, Genetic / genetics
  • Software
  • Transcription Initiation, Genetic

Substances

  • DNA