Informational structure of two closely related eukaryotic genomes

Phys Rev E Stat Nonlin Soft Matter Phys. 2006 Aug;74(2 Pt 1):021913. doi: 10.1103/PhysRevE.74.021913. Epub 2006 Aug 15.

Abstract

Attempts to identify a species on the basis of its DNA sequence on purely statistical grounds have been formulated for more than a decade. The most prominent of such genome signatures relies on neighborhood correlations (i.e., dinucleotide frequencies) and, consequently, attributes species identification to mechanisms operating on the dinucleotide level (e.g., neighbor-dependent mutations). For the examples of Mus musculus and Rattus norvegicus we analyze short- and intermediate-range statistical correlations in DNA sequences. These correlation profiles are computed for all chromosomes of the two species. We find that with increasing range of correlations the capacity to distinguish between the species on the basis of this correlation profile is getting better and requires ever shorter sequence segments for obtaining a full species separation. This finding suggests that distinctive traits within the sequence are situated beyond the level of few nucleotides. The large-scale statistical patterning of DNA sequences on which such genome signatures are based is thus substantially determined by mobile elements (e.g., transposons and retrotransposons). The study and interspecies comparison of such correlation profiles can, therefore, reveal features of retrotransposition, segmental duplications, and other processes of genome evolution.

MeSH terms

  • Animals
  • Chromosome Mapping / methods*
  • Computer Simulation
  • Genetic Code / genetics*
  • Genetic Variation / genetics
  • Information Storage and Retrieval / methods
  • Mice
  • Models, Genetic*
  • Quantitative Trait Loci / genetics*
  • Rats
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid
  • Species Specificity