The matrices and constraints of GT/AG splice sites of more than 1000 species/lineages

Gene. 2018 Jun 20:660:92-101. doi: 10.1016/j.gene.2018.03.031. Epub 2018 Mar 26.

Abstract

To provide a resource for the splice sites (SS) of different species, we calculated the matrices of nucleotide compositions of about 38 million splice sites from >1000 species/lineages. The matrices are enriched of aGGTAAGT (5'SS) or (Y)6N(C/t)AG(g/a)t (3'SS) overall; however, they are quite diverse among hundreds of species. The diverse matrices remain prominent even under sequence selection pressures, suggesting the existence of diverse constraints as well as U snRNAs and other spliceosomal factors and/or their interactions with the splice sites. Using an algorithm to measure and compare the splice site constraints across all species, we demonstrate their distinct differences quantitatively. As an example of the resource's application to answering specific questions, we confirm that high constraints of particular positions are significantly associated with transcriptome-wide, increased occurrences of alternative splicing when uncommon nucleotides are present. More interestingly, the abundance of alternative splicing in 16 species correlates with the average constraint index of splice sites in a bell curve. This resource will allow users to assess specific sequences/splice sites against the consensus of every Ensembl-annotated species, and to explore the evolutionary changes or relationship to alternative splicing and transcriptome diversity. Web-search or update features are also included.

Keywords: Alternative splicing; Diversity; Intron; Splice sites.

MeSH terms

  • Alternative Splicing*
  • Databases, Genetic*
  • Evolution, Molecular*
  • RNA Splice Sites*
  • RNA, Small Nuclear / genetics*

Substances

  • RNA Splice Sites
  • RNA, Small Nuclear