Features of coding and noncoding sequences based on 3-tuple distributions

Yi Chuan Xue Bao. 2005 Oct;32(10):1018-26.

Abstract

The origin of non-coding sequences, especially introns,is an outstanding issue that has been receiving continuous debate for the last two decades. In the current work we use a mathematical model to characterize DNA sequences and find that the 3-tuple distributions in different reading frames of a given coding sequence differ sharply from each other, while they are almost identical to each other in introns or other non-coding sequences. SREs (Symmetric relative entropies) decrease progressively from coding sequences of primitive prokaryotes to those of advanced eukaryotes and from non-coding sequences of low eukaryotes to those of high eukaryotes with a correlation coefficient of 0.86. In silico evolution experiments show that SREs typical of higher eukaryotic introns can be achieved from prokaryotic coding sequences as the mutation ratio reaches 2/100. The fact that (a total of 25 introns) from all three different genomes S. pombe, C. elegans and H. sapiens searched are found to share high sequence identity with coding regions indicates that at least some introns may have come directly from CDS (coding sequences). We suggest that SREs may be a useful feature for evolutionary study.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Entropy
  • Eukaryotic Cells / cytology
  • Eukaryotic Cells / metabolism*
  • Evolution, Molecular
  • Genome*
  • Genome, Bacterial
  • Genome, Fungal
  • Genome, Helminth
  • Genome, Human
  • Humans
  • Introns / genetics*
  • Models, Genetic
  • Open Reading Frames
  • Prokaryotic Cells / cytology
  • Prokaryotic Cells / metabolism*
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods