An exploratory in silico comparison of open-source codon harmonization tools

Microb Cell Fact. 2023 Nov 6;22(1):227. doi: 10.1186/s12934-023-02230-y.

Abstract

Background: Not changing the native constitution of genes prior to their expression by a heterologous host can affect the amount of proteins synthesized as well as their folding, hampering their activity and even cell viability. Over the past decades, several strategies have been developed to optimize the translation of heterologous genes by accommodating the difference in codon usage between species. While there have been a handful of studies assessing various codon optimization strategies, to the best of our knowledge, no research has been performed towards the evaluation and comparison of codon harmonization algorithms. To highlight their importance and encourage meaningful discussion, we compared different open-source codon harmonization tools pertaining to their in silico performance, and we investigated the influence of different gene-specific factors.

Results: In total, 27 genes were harmonized with four tools toward two different heterologous hosts. The difference in %MinMax values between the harmonized and the original sequences was calculated (ΔMinMax), and statistical analysis of the obtained results was carried out. It became clear that not all tools perform similarly, and the choice of tool should depend on the intended application. Almost all biological factors under investigation (GC content, RNA secondary structures and choice of heterologous host) had a significant influence on the harmonization results and thus must be taken into account. These findings were substantiated using a validation dataset consisting of 8 strategically chosen genes.

Conclusions: Due to the size of the dataset, no complex models could be developed. However, this initial study showcases significant differences between the results of various codon harmonization tools. Although more elaborate investigation is needed, it is clear that biological factors such as GC content, RNA secondary structures and heterologous hosts must be taken into account when selecting the codon harmonization tool.

Keywords: CHARMING; Codon Harmonization; Codon usage Bias; CodonWizard; EuGene; Galaxy; Synthetic Biology.

MeSH terms

  • Algorithms*
  • Biological Factors
  • Codon
  • Codon Usage
  • Proteins* / genetics

Substances

  • Codon
  • Proteins
  • Biological Factors