Protein-coding potential of non-canonical open reading frames in human transcriptome

Biochem Biophys Res Commun. 2023 Dec 3:684:149040. doi: 10.1016/j.bbrc.2023.09.068. Epub 2023 Oct 13.

Abstract

In recent years, proteogenomics and ribosome profiling studies have identified a large number of proteins encoded by noncoding regions in the human genome. They are encoded by small open reading frames (sORFs) in the untranslated regions (UTRs) of mRNAs and long non-coding RNAs (lncRNAs). These sORF encoded proteins (SEPs) are often <150AA and show poor evolutionary conservation. A subset of them have been functionally characterized and shown to play an important role in fundamental biological processes including cardiac and muscle function, DNA repair, embryonic development and various human diseases. How many novel protein-coding regions exist in the human genome and what fraction of them are functionally important remains a mystery. In this review, we discuss current progress in unraveling SEPs, approaches used for their identification, their limitations and reliability of these identifications. We also discuss functionally characterized SEPs and their involvement in various biological processes and diseases. Lastly, we provide insights into their distinctive features compared to canonical proteins and challenges associated with annotating these in protein reference databases.

Keywords: Non-coding RNAs; Novel proteins; Protein-coding potential; SEPs.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Open Reading Frames / genetics
  • RNA, Long Noncoding* / genetics
  • RNA, Messenger / genetics
  • Reproducibility of Results
  • Transcriptome* / genetics

Substances

  • RNA, Long Noncoding
  • RNA, Messenger