Computational identification of micro-structural variations and their proteogenomic consequences in cancer

Bioinformatics. 2018 May 15;34(10):1672-1681. doi: 10.1093/bioinformatics/btx807.

Abstract

Motivation: Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample. We introduce a computational framework, ProTIE, to integratively analyze all three types of omics data for a complete molecular profile of a tissue sample. Our framework features MiStrVar, a novel algorithmic method to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can accurately profile structurally aberrant transcripts in tumors. Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures. Observing structural aberrations in all three types of omics data validates their presence in the tumor samples.

Results: We have applied our framework to all The Cancer Genome Atlas (TCGA) breast cancer Whole Genome Sequencing (WGS) and/or RNA-Seq datasets, spanning all four major subtypes, for which proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) have been released. A recent study on this dataset focusing on SNVs has reported many that lead to novel peptides. Complementing and significantly broadening this study, we detected 244 novel peptides from 432 candidate genomic or transcriptomic sequence aberrations. Many of the fusions and microSVs we discovered have not been reported in the literature. Interestingly, the vast majority of these translated aberrations, fusions in particular, were private, demonstrating the extensive inter-genomic heterogeneity present in breast cancer. Many of these aberrations also have matching out-of-frame downstream peptides, potentially indicating novel protein sequence and structure.

Availability and implementation: MiStrVar is available for download at https://bitbucket.org/compbio/mistrvar, and ProTIE is available at https://bitbucket.org/compbio/protie.

Contact: cenksahi@indiana.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms / genetics*
  • Female
  • Gene Expression Profiling / methods
  • Gene Expression Regulation, Neoplastic
  • Gene Fusion*
  • Humans
  • Mass Spectrometry / methods
  • Neoplasm Proteins / analysis
  • Neoplasm Proteins / genetics*
  • Proteogenomics / methods*
  • Sequence Analysis, RNA / methods
  • Software*

Substances

  • Neoplasm Proteins