Proteogenomic Approach to UTR Peptide Identification

J Proteome Res. 2020 Jan 3;19(1):212-220. doi: 10.1021/acs.jproteome.9b00498. Epub 2019 Dec 5.

Abstract

Recent sequencing technologies have highlighted translation of untranslated regions (UTRs) in genomes, although it remains unknown whether the translated products persist in a cell. Here, we propose a proteogenomic approach to UTR identification at the proteome level, which has been challenging due to the lack of corresponding sequences required for peptide spectrum matching. We address the challenge with constructing translated UTR (tUTR) database, consisting of all hypothetical sequences that can be translated from UTR by assuming non-AUG initiation at near-cognate start codons and stop codon readthrough. In the analysis of the H1299 cell line mass spectrometry (MS/MS) dataset, the tUTR DB-based proteogenomic approach enabled the detection of 52 5'-UTR and 9 3'-UTR peptides from 45 and 9 genes, respectively. The identified UTR peptides were validated via high spectral similarity with their synthetic peptides. The 5'-UTR peptides pointed out alternative initiation sites with non-AUG start codons, which exactly conformed to Kozak contexts of annotated initiation sites. It is also worth noting that our approach can detect translated amino acid sequences as well as provide evidence for UTR translation, while ribosome profiling provides only the translation evidence. For previously reported stop codon readthrough in MDH1 gene, we could confirm the amino acid inserted during the readthrough. Data are available via ProteomeXchange with identifier PXD016207.

Keywords: non-AUG start codons; peptide identification; proteogenomics; tandem mass spectrometry; translational readthrough; untranslated regions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Codon, Initiator
  • Peptides / genetics
  • Proteogenomics*
  • Tandem Mass Spectrometry
  • Untranslated Regions

Substances

  • Codon, Initiator
  • Peptides
  • Untranslated Regions