Orphan and gene related CpG Islands follow power-law-like distributions in several genomes: evidence of function-related and taxonomy-related modes of distribution

Comput Biol Chem. 2014 Dec:53 Pt A:84-96. doi: 10.1016/j.compbiolchem.2014.08.013. Epub 2014 Sep 16.

Abstract

CpG Islands (CGIs) are compositionally defined short genomic stretches, which have been studied in the human, mouse, chicken and later in several other genomes. Initially, they were assigned the role of transcriptional regulation of protein-coding genes, especially the house-keeping ones, while more recently there is found evidence that they are involved in several other functions as well, which might include regulation of the expression of RNA genes, DNA replication etc. Here, an investigation of their distributional characteristics in a variety of genomes is undertaken for both whole CGI populations as well as for CGI subsets that lie away from known genes (gene-unrelated or "orphan" CGIs). In both cases power-law-like linearity in double logarithmic scale is found. An evolutionary model, initially put forward for the explanation of a similar pattern found in gene populations is implemented. It includes segmental duplication events and eliminations of most of the duplicated CGIs, while a moderate rate of non-duplicated CGI eliminations is also applied in some cases. Simulations reproduce all the main features of the observed inter-CGI chromosomal size distributions. Our results on power-law-like linearity found in orphan CGI populations suggest that the observed distributional pattern is independent of the analogous pattern that protein coding segments were reported to follow. The power-law-like patterns in the genomic distributions of CGIs described herein are found to be compatible with several other features of the composition, abundance or functional role of CGIs reported in the current literature across several genomes, on the basis of the proposed evolutionary model.

Keywords: CGIs; CpG dinucleotide; CpG-Islands; Genome evolution; Orphan CpG-Islands; Power-law-like distribution.

MeSH terms

  • Animals
  • Biological Evolution
  • Chromosome Duplication
  • Chromosome Mapping
  • Computer Simulation
  • CpG Islands*
  • Genome*
  • Humans
  • Models, Genetic*
  • Phylogeny
  • Sequence Analysis, DNA / statistics & numerical data*
  • Statistical Distributions