Discovery of novel genes and gene isoforms by integrating transcriptomic and proteomic profiling from mouse liver

J Proteome Res. 2014 May 2;13(5):2409-19. doi: 10.1021/pr4012206. Epub 2014 Apr 18.

Abstract

Comprehensively identifying gene expression in both transcriptomic and proteomic levels of one tissue is a prerequisite for a deeper understanding of its biological functions. Alternative splicing and RNA editing, two main forms of transcriptional processing, play important roles in transcriptome and proteome diversity and result in multiple isoforms for one gene, which are hard to identify by mass spectrometry (MS)-based proteomics approach due to the relative lack of isoform information in standard protein databases. In our study, we employed MS and RNA-Seq in parallel into mouse liver tissue and captured a considerable catalogue of both transcripts and proteins that, respectively, covered 60 and 34% of protein-coding genes in Ensembl. We then developed a bioinformatics workflow for building a customized protein database that for the first time included new splicing-derived peptides and RNA-editing-caused peptide variants, allowing us to more completely identify protein isoforms. Using this experimentally determined database, we totally identified 150 peptides not present in standard biological databases at false discovery rate of <1%, corresponding to 72 novel splicing isoforms, 43 new genetic regions, and 15 RNA-editing sites. Of these, 11 randomly selected novel events passed experimental verification by PCR and Sanger sequencing. New discoveries of gene products with high confidence in two omics levels demonstrated the robustness and effectiveness of our approach and its potential application into improve genome annotation. All the MS data have been deposited to the iProx ( http://ww.iprox.org ) with the identifier IPX00003601.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alternative Splicing
  • Amino Acid Sequence
  • Animals
  • Base Sequence
  • Databases, Protein
  • Gene Expression Profiling / methods*
  • Liver / metabolism*
  • Male
  • Mass Spectrometry
  • Mice, Inbred C57BL
  • Molecular Sequence Data
  • Peptides / genetics
  • Peptides / metabolism
  • Protein Isoforms / genetics
  • Protein Isoforms / metabolism
  • Proteins / genetics*
  • Proteins / metabolism*
  • Proteomics / methods*
  • RNA Editing
  • Sequence Analysis, RNA

Substances

  • Peptides
  • Protein Isoforms
  • Proteins