Bridging the gap between transcriptome and proteome measurements identifies post-translationally regulated genes

Bioinformatics. 2013 Dec 1;29(23):3060-6. doi: 10.1093/bioinformatics/btt537. Epub 2013 Sep 16.

Abstract

Motivation: Despite much dynamical cellular behaviour being achieved by accurate regulation of protein concentrations, messenger RNA abundances, measured by microarray technology, and more recently by deep sequencing techniques, are widely used as proxies for protein measurements. Although for some species and under some conditions, there is good correlation between transcriptome and proteome level measurements, such correlation is by no means universal due to post-transcriptional and post-translational regulation, both of which are highly prevalent in cells. Here, we seek to develop a data-driven machine learning approach to bridging the gap between these two levels of high-throughput omic measurements on Saccharomyces cerevisiae and deploy the model in a novel way to uncover mRNA-protein pairs that are candidates for post-translational regulation.

Results: The application of feature selection by sparsity inducing regression (l₁ norm regularization) leads to a stable set of features: i.e. mRNA, ribosomal occupancy, ribosome density, tRNA adaptation index and codon bias while achieving a feature reduction from 37 to 5. A linear predictor used with these features is capable of predicting protein concentrations fairly accurately (R² = 0.86). Proteins whose concentration cannot be predicted accurately, taken as outliers with respect to the predictor, are shown to have annotation evidence of post-translational modification, significantly more than random subsets of similar size P < 0.02. In a data mining sense, this work also shows a wider point that outliers with respect to a learning method can carry meaningful information about a problem domain.

MeSH terms

  • Artificial Intelligence
  • Codon / metabolism
  • Computational Biology / methods*
  • Gene Expression Regulation, Fungal*
  • Protein Processing, Post-Translational*
  • Proteome / analysis*
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism
  • RNA, Transfer / genetics
  • RNA, Transfer / metabolism
  • Ribosomes / metabolism
  • Saccharomyces cerevisiae / genetics*
  • Saccharomyces cerevisiae / metabolism
  • Saccharomyces cerevisiae Proteins / genetics*
  • Saccharomyces cerevisiae Proteins / metabolism
  • Transcriptome*

Substances

  • Codon
  • Proteome
  • RNA, Messenger
  • Saccharomyces cerevisiae Proteins
  • RNA, Transfer