Identification of unannotated coding sequences and their physiological functions

J Biochem. 2023 Mar 31;173(4):237-242. doi: 10.1093/jb/mvac064.

Abstract

Most protein-coding sequences (CDSs) are predicted sequences based on criteria such as a size sufficient to encode a product of at least 100 amino acids and with translation starting at an AUG initiation codon. However, recent studies based on ribosome profiling and mass spectrometry have shown that several RNAs annotated as long as noncoding RNAs are actually translated to generate polypeptides of fewer than 100 amino acids and that many proteins are translated from near-cognate initiation codons such as CUG and GUG. Furthermore, studies of genetically engineered mouse models have revealed that such polypeptides and proteins contribute to diverse physiological processes. In this review, we describe the latest methods for the identification of unannotated CDSs and provide examples of their physiological functions.

Keywords: long noncoding RNA (lncRNA); near-cognate initiation codon; polypeptide; ribosome profiling; translation.

Publication types

  • Review

MeSH terms

  • Amino Acids* / metabolism
  • Animals
  • Codon, Initiator
  • Mice
  • Peptides* / genetics
  • Peptides* / metabolism
  • Protein Biosynthesis
  • RNA, Messenger / metabolism

Substances

  • RNA, Messenger
  • Codon, Initiator
  • Peptides
  • Amino Acids