Designing robust watermark barcodes for multiplex long-read sequencing

Bioinformatics. 2017 Mar 15;33(6):807-813. doi: 10.1093/bioinformatics/btw322.

Abstract

Motivation: To attain acceptable sample misassignment rates, current approaches to multiplex single-molecule real-time sequencing require upstream quality improvement, which is obtained from multiple passes over the sequenced insert and significantly reduces the effective read length. In order to fully exploit the raw read length on multiplex applications, robust barcodes capable of dealing with the full single-pass error rates are needed.

Results: We present a method for designing sequencing barcodes that can withstand a large number of insertion, deletion and substitution errors and are suitable for use in multiplex single-molecule real-time sequencing. The manuscript focuses on the design of barcodes for full-length single-pass reads, impaired by challenging error rates in the order of 11%. The proposed barcodes can multiplex hundreds or thousands of samples while achieving sample misassignment probabilities as low as 10-7 under the above conditions, and are designed to be compatible with chemical constraints imposed by the sequencing process.

Availability and implementation: Software tools for constructing watermark barcode sets and demultiplexing barcoded reads, together with example sets of barcodes and synthetic barcoded reads, are freely available at www.cifasis-conicet.gov.ar/ezpeleta/NS-watermark .

Contact: ezpeleta@cifasis-conicet.gov.ar.

MeSH terms

  • Computer Simulation
  • High-Throughput Nucleotide Sequencing / methods*
  • Sequence Analysis, DNA / methods*
  • Software*