Eukaryotic Genomes Show Strong Evolutionary Conservation of k-mer Composition and Correlation Contributions between Introns and Intergenic Regions

Genes (Basel). 2021 Oct 1;12(10):1571. doi: 10.3390/genes12101571.

Abstract

Several strongly conserved DNA sequence patterns in and between introns and intergenic regions (IIRs) consisting of short tandem repeats (STRs) with repeat lengths <3 bp have already been described in the kingdom of Animalia. In this work, we expanded the search and analysis of conserved DNA sequence patterns to a wider range of eukaryotic genomes. Our aims were to confirm the conservation of these patterns, to support the hypothesis on their functional constraints and/or the identification of unknown patterns. We pairwise compared genomic DNA sequences of genes, exons, CDS, introns and intergenic regions of 34 Embryophyta (land plants), 30 Protista and 29 Fungi using established k-mer-based (alignment-free) comparison methods. Additionally, the results were compared with values derived for Animalia in former studies. We confirmed strong correlations between the sequence structures of IIRs spanning over the entire domain of Eukaryotes. We found that the high correlations within introns, intergenic regions and between the two are a result of conserved abundancies of STRs with repeat units ≤2 bp (e.g., (AT)n). For some sequence patterns and their inverse complementary sequences, we found a violation of equal distribution on complementary DNA strands in a subset of genomes. Looking at mismatches within the identified STR patterns, we found specific preferences for certain nucleotides stable over all four phylogenetic kingdoms. We conclude that all of these conserved patterns between IIRs indicate a shared function of these sequence structures related to STRs.

Keywords: 3D conformation of DNA; alignment-free; eukaryotes; evolution; genome; intergenic region; intron; k-mer; sequence analysis; sequence patterns; short tandem repeats (STRs).

MeSH terms

  • DNA, Intergenic / genetics*
  • Eukaryota / genetics
  • Evolution, Molecular*
  • Genome / genetics*
  • Introns / genetics*

Substances

  • DNA, Intergenic