A genome-wide survey of RS domain proteins

RNA. 2001 Dec;7(12):1693-701.

Abstract

Domains rich in alternating arginine and serine residues (RS domains) are frequently found in metazoan proteins involved in pre-mRNA splicing. The RS domains of splicing factors associate with each other and are important for the formation of protein-protein interactions required for both constitutive and regulated splicing. The prevalence of the RS domain in splicing factors suggests that it might serve as a useful signature for the identification of new proteins that function in pre-mRNA processing, although it remains to be determined whether RS domains also participate in other cellular functions. Using database search and sequence clustering methods, we have identified and categorized RS domain proteins encoded within the entire genomes of Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae. This genome-wide survey revealed a surprising complexity of RS domain proteins in metazoans with functions associated with chromatin structure, transcription by RNA polymerase II, cell cycle, and cell structure, as well as pre-mRNA processing. Also identified were RS domain proteins in S. cerevisiae with functions associated with cell structure, osmotic regulation, and cell cycle progression. The results thus demonstrate an effective strategy for the genomic mining of RS domain proteins. The identification of many new proteins using this strategy has provided a database of factors that are candidates for forming RS domain-mediated interactions associated with different steps in pre-mRNA processing, in addition to other cellular functions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Motifs / genetics*
  • Animals
  • Arginine / genetics
  • Caenorhabditis elegans / genetics
  • Cell Cycle
  • Chromatin / metabolism
  • Computational Biology / methods*
  • Drosophila melanogaster / genetics
  • Evolution, Molecular
  • Genome
  • Humans
  • Molecular Biology / methods*
  • Phosphoprotein Phosphatases
  • Protein Kinases
  • Protein Structure, Tertiary / genetics*
  • RNA Polymerase II / metabolism
  • RNA Processing, Post-Transcriptional
  • Research Design
  • Saccharomyces cerevisiae / genetics
  • Serine / genetics
  • Transcription, Genetic

Substances

  • Chromatin
  • Serine
  • Arginine
  • Protein Kinases
  • RNA Polymerase II
  • Phosphoprotein Phosphatases