Power-laws in the genomic distribution of coding segments in several organisms: an evolutionary trace of segmental duplications, possible paleopolyploidy and gene loss

Gene. 2009 Nov 1;447(1):18-28. doi: 10.1016/j.gene.2009.04.028. Epub 2009 Jul 8.

Abstract

Large-scale features of the spatial arrangement of protein-coding segments (PCS) are investigated by means of the inter-PCS spacers' size distributions, which have been found to follow power-laws. Linearity in double-logarithmic scale extends to several orders of magnitude in the genomes of organisms as disparate as mammals, insects and plants. This feature is also present in the most compact eukaryotic genomes and in half of the examined bacteria, despite their very limited non-coding space. We have tried to determine the sequence of events in the course of genomes' evolution which may account for the formation of the observed size distributions. The proposed mechanism essentially includes two types of events: (i) segmental duplications (and possibly paleopolyploidy), and (ii) the subsequent loss of most of the duplicated genes. It is shown by computer simulations that the formulated scenario generates power-law-like inter-PCS spacers' size distributions, which remain robust for a variety of parameter choices, even if insertion of external sequences, such as viruses or proliferating retroelements is included. Moreover, power-laws are preserved after most of the non-coding DNA has been removed, thus explaining the finding of this pattern in genomes as compact as that of Takifugu rubripes.

MeSH terms

  • Animals
  • Evolution, Molecular*
  • Gene Duplication*
  • Genome / genetics*
  • Humans
  • Open Reading Frames / genetics*
  • Polyploidy
  • Takifugu / genetics