Identification of protein-coding sequences using the hybridization of 18S rRNA and mRNA during translation

Nucleic Acids Res. 2009 Feb;37(2):591-601. doi: 10.1093/nar/gkn917. Epub 2008 Dec 10.

Abstract

We introduce a new approach in this article to distinguish protein-coding sequences from non-coding sequences utilizing a period-3, free energy signal that arises from the interactions of the 3'-terminal nucleotides of the 18S rRNA with mRNA. We extracted the special features of the amplitude and the phase of the period-3 signal in protein-coding regions, which is not found in non-coding regions, and used them to distinguish protein-coding sequences from non-coding sequences. We tested on all the experimental genes from Saccharomyces cerevisiae and Schizosaccharomyces pombe. The identification was consistent with the corresponding information from GenBank, and produced better performance compared to existing methods that use a period-3 signal. The primary tests on some fly, mouse and human genes suggests that our method is applicable to higher eukaryotic genes. The tests on pseudogenes indicated that most pseudogenes have no period-3 signal. Some exploration of the 3'-tail of 18S rRNA and pattern analysis of protein-coding sequences supported further our assumption that the 3'-tail of 18S rRNA has a role of synchronization throughout translation elongation process. This, in turn, can be utilized for the identification of protein-coding sequences.

MeSH terms

  • Base Pairing
  • Computational Biology / methods
  • Open Reading Frames*
  • Peptide Chain Elongation, Translational
  • RNA, Messenger / chemistry*
  • RNA, Ribosomal, 18S / chemistry*
  • Saccharomyces cerevisiae Proteins / genetics
  • Schizosaccharomyces pombe Proteins / genetics
  • Sequence Analysis, RNA / methods*

Substances

  • RNA, Messenger
  • RNA, Ribosomal, 18S
  • Saccharomyces cerevisiae Proteins
  • Schizosaccharomyces pombe Proteins