A Characterization of the DNA Data Storage Channel

Reinhard Heckel; Gediminas Mikutis; Robert N Grass

doi:10.1038/s41598-019-45832-6

A Characterization of the DNA Data Storage Channel

Sci Rep. 2019 Jul 4;9(1):9663. doi: 10.1038/s41598-019-45832-6.

Authors

Reinhard Heckel¹, Gediminas Mikutis², Robert N Grass²

Affiliations

¹ Rice University, Department of Electrical and Computer Engineering, Houston, 77005, Texas, USA. rh43@rice.edu.
² ETH Zurich, Department of Chemistry and Applied Biosciences, Zurich, 8093, Switzerland.

Abstract

Owing to its longevity and enormous information density, DNA, the molecule encoding biological information, has emerged as a promising archival storage medium. However, due to technological constraints, data can only be written onto many short DNA molecules that are stored in an unordered way, and can only be read by sampling from this DNA pool. Moreover, imperfections in writing (synthesis), reading (sequencing), storage, and handling of the DNA, in particular amplification via PCR, lead to a loss of DNA molecules and induce errors within the molecules. In order to design DNA storage systems, a qualitative and quantitative understanding of the errors and the loss of molecules is crucial. In this paper, we characterize those error probabilities by analyzing data from our own experiments as well as from experiments of two different groups. We find that errors within molecules are mainly due to synthesis and sequencing, while imperfections in handling and storage lead to a significant loss of sequences. The aim of our study is to help guide the design of future DNA data storage systems by providing a quantitative and qualitative understanding of the DNA data storage channel.

MeSH terms

Algorithms*
DNA / analysis*
DNA / genetics*
Diagnostic Tests, Routine / standards*
High-Throughput Nucleotide Sequencing / methods
Humans
Information Storage and Retrieval / standards*
Research Design / standards
Sequence Analysis, DNA / methods*
Specimen Handling / standards*

Substances

DNA