Conserved sequence features of inteins (protein introns) and their use in identifying new inteins and related proteins

Protein Sci. 1994 Dec;3(12):2340-50. doi: 10.1002/pro.5560031218.

Abstract

Inteins (protein introns) are internal portions of protein sequences that are posttranslationally excised while the flanking regions are spliced together, making an additional protein product. Inteins have been found in a number of homologous genes in yeast, mycobacteria, and extreme thermophile archaebacteria. The inteins are probably multifunctional, autocatalyzing their own splicing, and some were also shown to be DNA endonucleases. The splice junction regions and two regions similar to homing endonucleases were thought to be the only common sequence features of inteins. This work analyzed all published intein sequences with recently developed methods for detecting weak, conserved sequence features. The methods complemented each other in the identification and assessment of several patterns characterizing the intein sequences. New intein conserved features are discovered and the known ones are quantitatively described and localized. The general sequence description of all the known inteins is derived from the motifs and their relative positions. The intein sequence description is used to search the sequence databases for intein-like proteins. A sequence region in a mycobacterial open reading frame possessing all of the intein motifs and absent from sequences homologous to both of its flanking sequences is identified as an intein. A newly discovered putative intein in red algae chloroplasts is found not to contain the endonuclease motifs present in all other inteins. The yeast HO endonuclease is found to have an overall intein-like structure and a few viral polyprotein cleavage sites are found to be significantly similar to the inteins amino-end splice junction motif. The intein features described may serve for detection of intein sequences.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Bacterial Proteins / chemistry*
  • DNA Helicases*
  • Deoxyribonucleases, Type II Site-Specific / chemistry
  • DnaB Helicases
  • Fungal Proteins / chemistry
  • Introns*
  • Molecular Sequence Data
  • Mycobacterium leprae / chemistry
  • Open Reading Frames
  • Plant Proteins / chemistry
  • Protein Processing, Post-Translational*
  • Proteins / chemistry*
  • Proteins / metabolism
  • Rhodophyta / chemistry
  • Saccharomyces cerevisiae / chemistry
  • Saccharomyces cerevisiae Proteins
  • Sequence Alignment
  • Sequence Homology, Amino Acid

Substances

  • Bacterial Proteins
  • Fungal Proteins
  • Plant Proteins
  • Proteins
  • Saccharomyces cerevisiae Proteins
  • pps1 protein, Mycobacterium sp.
  • HO protein, S cerevisiae
  • SCEI protein, S cerevisiae
  • Deoxyribonucleases, Type II Site-Specific
  • DNA Helicases
  • DnaB Helicases

Associated data

  • GENBANK/D29671
  • GENBANK/M62622
  • GENBANK/M64984
  • GENBANK/U00013
  • GENBANK/U00707
  • GENBANK/X55026
  • GENBANK/X73822
  • SWISSPROT/P05769
  • SWISSPROT/P06935
  • SWISSPROT/P09932
  • SWISSPROT/P17255
  • SWISSPROT/P26345
  • SWISSPROT/P30317
  • SWISSPROT/P32886