Proteoform Identification by Combining RNA-Seq and Top-Down Mass Spectrometry

J Proteome Res. 2021 Jan 1;20(1):261-269. doi: 10.1021/acs.jproteome.0c00369. Epub 2020 Nov 12.

Abstract

In proteogenomic studies, genomic and transcriptomic variants are incorporated into customized protein databases for the identification of proteoforms, especially proteoforms with sample-specific variants. Most proteogenomic research has been focused on combining genomic or transcriptomic data with bottom-up mass spectrometry data. In the last decade, top-down mass spectrometry has attracted increasing attention because of its capacity to identify various proteoforms with alterations. However, top-down proteogenomics, in which genomic or transcriptomic data are combined with top-down mass spectrometry data, has not been widely adopted, and there is still a lack of software tools for top-down proteogenomic data analysis. In this paper, we introduce TopPG, a proteogenomic tool for generating proteoform sequence databases with genetic alterations and alternative splicing events. Experiments on top-down proteogenomic data of DLD-1 colorectal cancer cells showed that TopPG coupled with database search confidently identified proteoforms with sample-specific alterations.

Keywords: RNA-seq; proteogenomics; top-down mass spectrometry.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Databases, Protein
  • Mass Spectrometry
  • Proteogenomics*
  • Proteome* / genetics
  • RNA-Seq

Substances

  • Proteome