Model-driven design of synthetic N-terminal coding sequences for regulating gene expression in yeast and bacteria

Biotechnol J. 2022 May;17(5):e2100655. doi: 10.1002/biot.202100655. Epub 2022 Feb 1.

Abstract

N-terminal coding sequences (NCSs) are key regulatory elements for fine-tuning gene expression during translation initiation-the rate-limiting step of translation. However, owing to the complex combinatory effects of NCS biophysical factors and endogenous regulation, designing NCSs remains challenging. In this study, a multi-view learning strategy for model-driven generation of synthetic NCSs for Saccharomyces cerevisiae and Bacillus subtilis are implemented, which are widely used in laboratories and industries. NCS libraries for S. cerevisiae and B. subtilis with nearly 150,000 cells were sorted. Next, model training was performed with NCS deep features extracted from DNA, codon, and amino acid sequences, as well as calculated features from the minimum free energy (MFE) and tRNA adaption index. Two models were separately developed for generating synthetic NCSs for both up- and down-regulating gene expression with accuracies higher than 65% for S. cerevisiae and B. subtilis. Synthetic NCSs were then applied to enhance bioproduction, yielding 1.48- and 1.71-fold production improvements of D-limonene by S. cerevisiae and ovalbumin by B. subtilis, respectively. This work provides model-driven design of synthetic NCSs as a toolbox for regulating gene expression in S. cerevisiae and B. subtilis. The machine learning-based modeling approach can be used for NCS design in other microorganisms.

Keywords: Bacillus subtilis; N-terminal coding sequences; Saccharomyces cerevisiae; biosynthesis pathway; multi-view learning.

MeSH terms

  • Bacillus subtilis / genetics
  • Bacillus subtilis / metabolism
  • Codon / metabolism
  • Gene Expression
  • Saccharomyces cerevisiae* / genetics
  • Saccharomyces cerevisiae* / metabolism
  • Yeast, Dried*

Substances

  • Codon