Sequence of the region coding for virion proteins C and E2 and the carboxy terminus of the nonstructural proteins of rubella virus: comparison with alphaviruses

Gene. 1988;62(1):85-99. doi: 10.1016/0378-1119(88)90582-3.

Abstract

The sequence of the 3' 4508 nucleotides (nt) of the genomic RNA of the Therien strain of rubella virus (RV) was determined for cDNA clones. The sequence contains a 3189-nt open reading frame (ORF) which codes for the structural proteins C, E2 and E1. C is predicted to have a length of 300 amino acids (aa). The N-terminal half of the C protein is highly basic and hydrophilic in nature, and is putatively the region of the protein which interacts with the virion RNA. At the C terminus of the C protein is a stretch of 20 hydrophobic aa which also serves as the signal sequence for E2, indicating that the cleavage of C from the polyprotein precursor may be catalyzed by signalase in the lumen of the endoplasmic reticulum. E2 is 282 aa in length and contains four potential N-linked glycosylation sites and a putative transmembrane domain near its C terminus. The sequence of E1 has been previously described [Frey et al., Virology 154 (1986) 228-232]. No homology could be detected between the amino acid sequence of the RV structural proteins and the amino acid sequence of the alphavirus structural proteins. From the position of a region of 30 nt in the RV genomic sequence which exhibited significant homology with the sequence in the alphavirus genome at which subgenomic RNA synthesis is initiated, the RV subgenomic RNA is predicted to be 3346 nt in length and the nontranslated region from the 5' end of the subgenomic RNA to the structural protein ORF is predicted to be 98 nt. In a different translation frame beginning at the 5' end of the RV nt sequence reported here is a 1407 nt ORF which is the C terminal region of the nonstructural protein ORF. This ORF overlaps the structural protein ORF by 149 nt. A low level of homology could be detected between the predicted amino acid sequence of the C-terminus of the RV nonstructural protein ORF and the replicase proteins of several positive RNA viruses of animals and plants, including nsp4 of the alphaviruses, the protein encoded by the C-terminal region of the alphavirus nonstructural ORF. However, the overall homology between RV and the alphaviruses in this region of the genome was only 18%, indicating that these two genera of the Togavirus family are only distantly related. Intriguingly, there is a 2844-nt ORF present in the negative polarity orientation of the RV sequence which could encode a 928-aa polyprotein.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Alphavirus / genetics
  • Amino Acid Sequence
  • Base Sequence
  • Codon
  • DNA / genetics
  • Genes
  • Genes, Viral*
  • Glycoproteins / genetics
  • Molecular Sequence Data
  • RNA, Viral / genetics
  • Rubella virus / genetics*
  • Sequence Homology, Nucleic Acid
  • Viral Proteins / genetics*
  • Virion / genetics*

Substances

  • Codon
  • Glycoproteins
  • RNA, Viral
  • Viral Proteins
  • DNA

Associated data

  • GENBANK/M15240
  • GENBANK/M18901