MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms

Bioinformatics. 2015 Jun 15;31(12):i106-15. doi: 10.1093/bioinformatics/btv236.

Abstract

Ongoing advances in high-throughput technologies have facilitated accurate proteomic measurements and provide a wealth of information on genomic and transcript level. In proteogenomics, this multi-omics data is combined to analyze unannotated organisms and to allow more accurate sample-specific predictions. Existing analysis methods still mainly depend on six-frame translations or reference protein databases that are extended by transcriptomic information or known single nucleotide polymorphisms (SNPs). However, six-frames introduce an artificial sixfold increase of the target database and SNP integration requires a suitable database summarizing results from previous experiments. We overcome these limitations by introducing MSProGene, a new method for integrative proteogenomic analysis based on customized RNA-Seq driven transcript databases. MSProGene is independent from existing reference databases or annotated SNPs and avoids large six-frame translated databases by constructing sample-specific transcripts. In addition, it creates a network combining RNA-Seq and peptide information that is optimized by a maximum-flow algorithm. It thereby also allows resolving the ambiguity of shared peptides for protein inference. We applied MSProGene on three datasets and show that it facilitates a database-independent reliable yet accurate prediction on gene and protein level and additionally identifies novel genes.

Availability and implementation: MSProGene is written in Java and Python. It is open source and available at http://sourceforge.net/projects/msprogene/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Bartonella / genetics
  • Databases, Genetic
  • Filarioidea / genetics
  • Gene Expression Profiling*
  • Genomics / methods*
  • Mass Spectrometry
  • Peptides / chemistry
  • Polymorphism, Single Nucleotide
  • Proteins / chemistry
  • Proteins / genetics
  • Proteins / metabolism
  • Proteomics / methods*
  • Sequence Analysis, RNA*
  • Software

Substances

  • Peptides
  • Proteins