Optimizing DNA assembly based on statistical language modelling

Gang Fang; Shemin Zhang; Yafei Dong

doi:10.1093/nar/gkx859

Optimizing DNA assembly based on statistical language modelling

Nucleic Acids Res. 2017 Dec 15;45(22):e182. doi: 10.1093/nar/gkx859.

Authors

Gang Fang^{1

2}, Shemin Zhang³, Yafei Dong⁴

Affiliations

¹ Institute of Advanced Cyberspace Technology, Guangzhou University, Guangzhou 510006, China.
² Genetic Engineering Laboratory, School of Biological and Environmental Engineering, Xi'an University, Xi'an 710065, China.
³ School of Mathematics and Computer Science, Shaanxi University of Technology, Hanzhong 723001, China.
⁴ College of life sciences, Shaanxi Normal University, Xi'an 710119, China.

Abstract

By successively assembling genetic parts such as BioBrick according to grammatical models, complex genetic constructs composed of dozens of functional blocks can be built. However, usually every category of genetic parts includes a few or many parts. With increasing quantity of genetic parts, the process of assembling more than a few sets of these parts can be expensive, time consuming and error prone. At the last step of assembling it is somewhat difficult to decide which part should be selected. Based on statistical language model, which is a probability distribution P(s) over strings S that attempts to reflect how frequently a string S occurs as a sentence, the most commonly used parts will be selected. Then, a dynamic programming algorithm was designed to figure out the solution of maximum probability. The algorithm optimizes the results of a genetic design based on a grammatical model and finds an optimal solution. In this way, redundant operations can be reduced and the time and cost required for conducting biological experiments can be minimized.

MeSH terms

Algorithms*
DNA / genetics*
Genetic Engineering / methods
Models, Statistical*
Programming Languages*
Synthetic Biology / methods

Substances

DNA