DNA-Based Concatenated Encoding System for High-Reliability and High-Density Data Storage

Small Methods. 2022 Apr;6(4):e2101335. doi: 10.1002/smtd.202101335. Epub 2022 Feb 10.

Abstract

Information storage based on DNA molecules provides a promising solution with advantages of low-energy consumption, high storage efficiency, and long lifespan. However, there are only four natural nucleotides and DNA storage is thus limited by 2 bits per nucleotide. Here, artificial nucleotides into DNA data storage to achieve higher coding efficiency than 2 bits per nucleotide is introduced. To accommodate the characteristics of DNA synthesis and sequencing, two high-reliability encoding systems suitable for four, six, and eight nucleotides, i.e., the RaptorQ-Arithmetic-LZW-RS (RALR) and RaptorQ-Arithmetic-Base64-RS (RABR) systems, are developed. The two concatenated encoding systems realize the advantages of correcting DNA sequence losses, correcting errors within DNA sequences, reducing homopolymers, and controlling specific nucleotide contents. The average coding efficiencies with error correction and without arithmetic compression by the RALR system using four, six, and eight nucleotides reach 1.27, 1.61, and 1.85 bits per nucleotide, respectively. While the average coding efficiencies by the RABR system are up to 1.50, 2.00, and 2.35 bits per nucleotide, respectively. The coding efficiency, versatility, and tunability of the developed artificial DNA systems might provide significant guidance for high-reliability and high-density data storage.

Keywords: DNA; artificial nucleotides; data storage; encoding systems; high-density data storage.

MeSH terms

  • DNA* / genetics
  • Information Storage and Retrieval*
  • Nucleotides
  • Reproducibility of Results
  • Sequence Analysis, DNA

Substances

  • Nucleotides
  • DNA