Genomic variability and protein species - Improving sequence coverage for proteogenomics

J Proteomics. 2016 Feb 16:134:25-36. doi: 10.1016/j.jprot.2015.09.021. Epub 2015 Sep 21.

Abstract

Protein heterogeneity may result from many factors often closely related to the regulation of biological mechanisms. This review addresses one source of protein heterogeneity, the translation of genetic variability and transcriptional modulation to the protein level. We provide an overview how customized protein sequence databases generated using genomic and transcriptomic sequence information in conjunction with approaches to increase protein sequence coverage can aid in gaining a deeper insight into variability at the protein level. Modern approaches of DNA/RNA sequencing open the possibility to obtain detailed sequence information from individual genomes and transcriptomes at single nucleotide resolution. Further studies tried to correlate genetic variability with important biological consequences such as the risk for developing a disease or defining a personalized approach towards therapy (also called "personalized or precision medicine"). Linking genomic and transcriptomic information to complex biological mechanisms has, however, remained elusive due to the fact that there is no direct cause and effect relationship between changes at the DNA/RNA level and downstream consequences. In this review we give an overview of the challenges of integrating genomics and transcriptomics data with proteomics data and link variability at the DNA/RNA level to protein variability and protein species.

Biological significance: The manuscript focuses on a recent trend in proteomics, namely the integration of genomic and proteomic data. Genetic and transcriptomic variability accounts for a considerable part of protein variability and is at the basis of many protein species, many of which not yet described at the protein level but many also identified as proteins or peptides with unknown function. The review highlights the challenges of current proteomics methodology, notably incomplete sequence coverage, which make it difficult to appreciate the full complexity of any proteome and leads to the fact that much variability at the DNA/RNA level is not captured at the protein level. We outline a few strategies to ameliorate this situation.

Keywords: DNA/RNA sequencing; Genomics; Mass spectrometry; Proteogenomics; Proteomics.

Publication types

  • Review

MeSH terms

  • Animals
  • Genetic Variation*
  • Genomics / methods*
  • Humans
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Analysis, RNA / methods*