Oligo Design with Single Primer Binding Site for High Capacity DNA-Based Data Storage

IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2176-2182. doi: 10.1109/TCBB.2019.2940177. Epub 2020 Dec 8.

Abstract

DNA has become an attractive medium for long-term data archiving due to its extremely high storage density and longevity. Short single-stranded DNAs, called oligonucleotides (oligos), have been designed and synthesized to store digital data. Previous works designed the oligos with a pair of primer binding sites (PBSs) (each with a length of around 200) attached at the two ends of each basic readable data block. The addition of PBSs decreases the data density significantly because in the current DNA synthesis, the maximum length of a synthesized oligo in good quality is around 200. Furthermore, the maximum homopolymer run allowed by the existing experiments has been reported to be three nucleotides. In this work, to increase the data density, we have devised and tested an oligo design for DNA-based storage with the basic readable data block appended by a single PBS at one end only, while allowing the maximum homopolymer run to be increased to 4. We also present an oligo assembly algorithm that can reconstruct oligos with a single PBS from the error-prone raw readouts obtained from the sequencing process. We have conducted a wet lab experiment to validate the proposed design, where we tested with 398KB of data stored into 10,750 oligos. The experimental results show that it is possible to recover over 99 percent of the oligo sequences without error, which proves that one PBS is sufficient for implementing a DNA-based data storage system with maximum homopolymer run relaxed to 4. The use of single PBS leads to a significant data density gain from 14.3 to 140.2 percent over the existing short-strand DNA data storage schemes by reserving more nucleotides for storing information bits.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites
  • Computational Biology
  • Computers, Molecular*
  • DNA / chemistry*
  • DNA / metabolism
  • DNA Primers / chemistry*
  • DNA Primers / metabolism
  • High-Throughput Nucleotide Sequencing
  • Information Storage and Retrieval

Substances

  • DNA Primers
  • DNA