Optimizing DNA assembly based on statistical language modelling

Nucleic Acids Res. 2017 Dec 15;45(22):e182. doi: 10.1093/nar/gkx859.

Abstract

By successively assembling genetic parts such as BioBrick according to grammatical models, complex genetic constructs composed of dozens of functional blocks can be built. However, usually every category of genetic parts includes a few or many parts. With increasing quantity of genetic parts, the process of assembling more than a few sets of these parts can be expensive, time consuming and error prone. At the last step of assembling it is somewhat difficult to decide which part should be selected. Based on statistical language model, which is a probability distribution P(s) over strings S that attempts to reflect how frequently a string S occurs as a sentence, the most commonly used parts will be selected. Then, a dynamic programming algorithm was designed to figure out the solution of maximum probability. The algorithm optimizes the results of a genetic design based on a grammatical model and finds an optimal solution. In this way, redundant operations can be reduced and the time and cost required for conducting biological experiments can be minimized.

MeSH terms

  • Algorithms*
  • DNA / genetics*
  • Genetic Engineering / methods
  • Models, Statistical*
  • Programming Languages*
  • Synthetic Biology / methods

Substances

  • DNA