Identification and analysis of small proteins and short open reading frame encoded peptides in Hep3B cell

J Proteomics. 2021 Jan 6:230:103965. doi: 10.1016/j.jprot.2020.103965. Epub 2020 Sep 3.

Abstract

The small proteins and short open reading frames encoded peptides (SEPs) are of fundamental importance because of their essential roles in biological processes. However, the annotation or identification of them is challenging, in part owing to the limitation of the traditional genome annotation pipeline and their inherent characteristics of low abundance and low molecular weight. To discover and characterize SEPs in Hep3B cell line, we developed an optimized peptidomic assay by combining different peptide extraction and separation methods. The organic solvent precipitation method in peptidomic showed promotion in the enrichment of low molecular proteins or peptides, and the data clearly showed a beneficial effect from the reduction of sample complexity, resulting in high-quality MS/MS spectra. Furthermore, different strategies exhibited good complementarity in improving the total amount of small proteins and their sequence coverage. In total, 1192 proteins within less than 100 amino acids were identified, including 271 newly discovered SEPs that been annotated in the OpenProt database and 147 SEPs of them encoded from ncRNA or lincRNA. Results in this work provide robust evidence to date that the human proteome is more complicated than previously appreciated, and this will be a benefit to discoveries of proteins without function annotation. SIGNIFICANCE: In this work, methods were optimized to identify SEPs in Hep3B. The organic solvent precipitation presents promotion in enrichment of low molecular proteins or peptides, and the data clearly showed a beneficial effect from the reduction of sample complexity, resulting in high quality MS/MS spectra. Different strategies exhibited good complementarity in improving total amount of small proteins and their sequence coverage. In total, 1192 proteins within less than 100 amino acids were identified, including 271 newly discovered SEPs that been annotated in the OpenProt database and 147 SEPs of them encoded from ncRNA or lincRNA. Furthermore, 22 SEPs generated from the uORF may has potential effect in translation control, and 149 newly identified SEPs have known functional domains or cross-species conservation. Results in this work present robust evidence for the coding potential of the ignored region of human genomes and may provide additional insights into tumor biology.

Keywords: Acetonitrile precipitation; Hep3B cell line; Peptidomic; SEP enrichment; Short open reading frames; sORF-encoded peptides.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genome, Human
  • Humans
  • Open Reading Frames
  • Peptides* / genetics
  • Proteome / genetics
  • Tandem Mass Spectrometry*

Substances

  • Peptides
  • Proteome