Integration of mass spectrometry and RNA-Seq data to confirm human ab initio predicted genes and lncRNAs

Proteomics. 2014 Dec;14(23-24):2760-8. doi: 10.1002/pmic.201400174.

Abstract

MS/MS has been used to improve genome annotation in various organisms. The classical approach is to construct comprehensive theoretical peptide database with six frame translation model from the whole ORF of a genome and search against this database with real MS/MS spectra. In this work we took a more focused approach, we constructed a database containing only peptides from the ab initio predicted genes from current human genome annotation, and all theoretical peptides from currently annotated lncRNAs, and searched such a database with MS/MS data from human Hela cell line. The purpose of this design is to find translation evidence for ab initio predicted genes and to rule out possible wrongly defined lncRNAs in a systematic proteogenomics effort. To validate proteogenomics results, we integrated RNA-Seq data analysis for the same Hela cell line which generated MS/MS data, and performed MRM experiment on self-cultured Hela cell line samples. Six peptides were found to support ab initio predicted genes with both RNA-Seq and MRM validations, while none was found to support a translated lncRNA. This workflow could be flexibly applied to other human samples and datasets to help further improve human gene annotation.

Keywords: Bioinformatics; Gene annotation; MRM; MS/MS; Proteogenomics; RNA-Seq.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Humans
  • Mass Spectrometry / methods*
  • Molecular Sequence Annotation
  • Sequence Analysis, RNA / methods
  • Tandem Mass Spectrometry