High information capacity DNA-based data storage with augmented encoding characters using degenerate bases

Sci Rep. 2019 Apr 29;9(1):6582. doi: 10.1038/s41598-019-43105-w.

Abstract

DNA-based data storage has emerged as a promising method to satisfy the exponentially increasing demand for information storage. However, practical implementation of DNA-based data storage remains a challenge because of the high cost of data writing through DNA synthesis. Here, we propose the use of degenerate bases as encoding characters in addition to A, C, G, and T, which augments the amount of data that can be stored per length of DNA sequence designed (information capacity) and lowering the amount of DNA synthesis per storing unit data. Using the proposed method, we experimentally achieved an information capacity of 3.37 bits/character. The demonstrated information capacity is more than twice when compared to the highest information capacity previously achieved. The proposed method can be integrated with synthetic technologies in the future to reduce the cost of DNA-based data storage by 50%.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence / genetics
  • DNA / genetics*
  • Databases, Nucleic Acid*
  • Information Storage and Retrieval*
  • Sequence Analysis, DNA

Substances

  • DNA