Open reading frame dominance indicates protein-coding potential of RNAs

EMBO Rep. 2022 Jun 7;23(6):e54321. doi: 10.15252/embr.202154321. Epub 2022 Apr 19.

Abstract

Recent studies have identified numerous RNAs with both coding and noncoding functions. However, the sequence characteristics that determine this bifunctionality remain largely unknown. In the present study, we develop and test the open reading frame (ORF) dominance score, which we define as the fraction of the longest ORF in the sum of all putative ORF lengths. This score correlates with translation efficiency in coding transcripts and with translation of noncoding RNAs. In bacteria and archaea, coding and noncoding transcripts have narrow distributions of high and low ORF dominance, respectively, whereas those of eukaryotes show relatively broader ORF dominance distributions, with considerable overlap between coding and noncoding transcripts. The extent of overlap positively and negatively correlates with the mutation rate of genomes and the effective population size of species, respectively. Tissue-specific transcripts show higher ORF dominance than ubiquitously expressed transcripts, and the majority of tissue-specific transcripts are expressed in mature testes. These data suggest that the decrease in population size and the emergence of testes in eukaryotic organisms allowed for the evolution of potentially bifunctional RNAs.

Keywords: ORF dominance; gene birth; molecular evolution; noncoding RNA; protein-coding potential.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genome
  • Open Reading Frames / genetics
  • Proteins* / genetics
  • RNA*
  • RNA, Untranslated / genetics

Substances

  • Proteins
  • RNA, Untranslated
  • RNA

Associated data

  • figshare/10.6084/m9.figshare.7269500
  • figshare/10.6084/m9.figshare.7269518