A dual-rule encoding DNA storage system using chaotic mapping to control GC content

Bioinformatics. 2024 Mar 4;40(3):btae113. doi: 10.1093/bioinformatics/btae113.

Abstract

Motivation: DNA as a novel storage medium is considered an effective solution to the world's growing demand for information due to its high density and long-lasting reliability. However, early coding schemes ignored the biologically constrained nature of DNA sequences in pursuit of high density, leading to DNA synthesis and sequencing difficulties. This article proposes a novel DNA storage coding scheme. The system encodes half of the binary data using each of the two GC-content complementary encoding rules to obtain a DNA sequence.

Results: After simulating the encoding of representative document and image file formats, a DNA sequence strictly conforming to biological constraints was obtained, reaching a coding potential of 1.66 bit/nt. In the decoding process, a mechanism to prevent error propagation was introduced. The simulation results demonstrate that by adding Reed-Solomon code, 90% of the data can still be recovered after introducing a 2% error, proving that the proposed DNA storage scheme has high robustness and reliability. Availability and implementation: The source code for the codec scheme of this paper is available at https://github.com/Mooreniah/DNA-dual-rule-rotary-encoding-storage-system-DRRC.

MeSH terms

  • Base Composition
  • DNA Replication
  • DNA* / genetics
  • Reproducibility of Results
  • Software*

Substances

  • DNA