Deeply Mining a Universe of Peptides Encoded by Long Noncoding RNAs

Mol Cell Proteomics. 2021:20:100109. doi: 10.1016/j.mcpro.2021.100109. Epub 2021 Jun 12.

Abstract

Many small ORFs embedded in long noncoding RNA (lncRNA) transcripts have been shown to encode biologically functional polypeptides (small ORF-encoded polypeptides [SEPs]) in different organisms. Despite some novel SEPs have been found, the identification is still hampered by their poor predictability, diminutive size, and low relative abundance. Here, we take advantage of NONCODE, a repository containing the most complete collection and annotation of lncRNA transcripts from different species, to build a novel database that attempts to maximize a collection of SEPs from human and mouse lncRNA transcripts. In order to further improve SEP discovery, we implemented two effective and complementary polypeptide enrichment strategies using 30-kDa molecular weight cutoff filter and C8 solid-phase extraction column. These combined strategies enabled us to discover 353 SEPs from eight human cell lines and 409 SEPs from three mouse cell lines and eight mouse tissues. Importantly, 19 of them were then verified through in vitro expression, immunoblotting, parallel reaction monitoring, and synthetic peptides. Subsequent bioinformatics analysis revealed that some of the physical and chemical properties of these novel SEPs, including amino acid composition and codon usage, are different from those commonly found in canonical proteins. Intriguingly, nearly 65% of the identified SEPs were found to be initiated with non-AUG start codons. The 762 novel SEPs probably represent the largest number of SEPs detected by MS reported to date. These novel SEPs might not only provide new clues for the annotation of noncoding elements in the genome but also serve as a valuable resource for functional study.

Keywords: MS; NONCODE database; enrichment; long noncoding RNA; smORF-encoded polypeptides.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cell Line
  • Female
  • Humans
  • Male
  • Mass Spectrometry
  • Mice
  • Mice, Inbred C57BL
  • Open Reading Frames*
  • Peptides*
  • RNA, Long Noncoding*

Substances

  • Peptides
  • RNA, Long Noncoding