Predicting the sizes of large RNA molecules

Proc Natl Acad Sci U S A. 2008 Oct 21;105(42):16153-8. doi: 10.1073/pnas.0808089105. Epub 2008 Oct 9.

Abstract

We present a theory of the dependence on sequence of the three-dimensional size of large single-stranded (ss) RNA molecules. The work is motivated by the fact that the genomes of many viruses are large ssRNA molecules-often several thousand nucleotides long-and that these RNAs are spontaneously packaged into small rigid protein shells. We argue that there has been evolutionary pressure for the genome to have overall spatial properties-including an appropriate radius of gyration, R(g)-that facilitate this assembly process. For an arbitrary RNA sequence, we introduce the (thermal) average maximum ladder distance (MLD) and use it as a measure of the "extendedness" of the RNA secondary structure. The MLD values of viral ssRNAs that package into capsids of fixed size are shown to be consistently smaller than those for randomly permuted sequences of the same length and base composition, and also smaller than those of natural ssRNAs that are not under evolutionary pressure to have a compact native form. By mapping these secondary structures onto a linear polymer model and by using MLD as a measure of effective contour length, we predict the R(g) values of viral ssRNAs are smaller than those of nonviral sequences. More generally, we predict the average MLD values of large nonviral ssRNAs scale as N(0.67+/-0.01), where N is the number of nucleotides, and that their R(g) values vary as MLD(0.5) in an ideal solvent, and hence as N(0.34). An alternative analysis, which explicitly includes all branches, is introduced and shown to yield consistent results.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Base Sequence
  • Models, Molecular
  • Molecular Sequence Data
  • Nucleic Acid Conformation*
  • RNA / chemistry*

Substances

  • RNA