Codon usage clusters correlation: towards protein solubility prediction in heterologous expression systems in E. coli

Sci Rep. 2018 Jul 13;8(1):10618. doi: 10.1038/s41598-018-29035-z.

Abstract

Production of soluble recombinant proteins is crucial to the development of industry and basic research. However, the aggregation due to the incorrect folding of the nascent polypeptides is still a mayor bottleneck. Understanding the factors governing protein solubility is important to grasp the underlying mechanisms and improve the design of recombinant proteins. Here we show a quantitative study of the expression and solubility of a set of proteins from Bizionia argentinensis. Through the analysis of different features known to modulate protein production, we defined two parameters based on the %MinMax algorithm to compare codon usage clusters between the host and the target genes. We demonstrate that the absolute difference between all %MinMax frequencies of the host and the target gene is significantly negatively correlated with protein expression levels. But most importantly, a strong positive correlation between solubility and the degree of conservation of codons usage clusters is observed for two independent datasets. Moreover, we evince that this correlation is higher in codon usage clusters involved in less compact protein secondary structure regions. Our results provide important tools for protein design and support the notion that codon usage may dictate translation rate and modulate co-translational folding.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics
  • Bacterial Proteins / metabolism*
  • Codon
  • Escherichia coli / genetics*
  • Escherichia coli / metabolism
  • Flavobacteriaceae / genetics*
  • Flavobacteriaceae / metabolism
  • Protein Biosynthesis / genetics*
  • Protein Structure, Secondary / genetics
  • Recombinant Proteins / chemistry
  • Recombinant Proteins / genetics
  • Recombinant Proteins / metabolism
  • Solubility

Substances

  • Bacterial Proteins
  • Codon
  • Recombinant Proteins