Accounting for uncertainty in DNA sequencing data

Trends Genet. 2015 Feb;31(2):61-6. doi: 10.1016/j.tig.2014.12.002. Epub 2015 Jan 8.

Abstract

Science is defined in part by an honest exposition of the uncertainties that arise in measurements and propagate through calculations and inferences, so that the reliabilities of its conclusions are made apparent. The recent rapid development of high-throughput DNA sequencing technologies has dramatically increased the number of measurements made at the biochemical and molecular level. These data come from many different DNA-sequencing technologies, each with their own platform-specific errors and biases, which vary widely. Several statistical studies have tried to measure error rates for basic determinations, but there are no general schemes to project these uncertainties so as to assess the surety of the conclusions drawn about genetic, epigenetic, and more general biological questions. We review here the state of uncertainty quantification in DNA sequencing applications, describe sources of error, and propose methods that can be used for accounting and propagating these errors and their uncertainties through subsequent calculations.

Keywords: DNA sequencing; sequence errors; uncertainty; uncertainty accounting.

Publication types

  • Review

MeSH terms

  • Base Sequence
  • High-Throughput Nucleotide Sequencing* / methods
  • High-Throughput Nucleotide Sequencing* / standards
  • Humans
  • Models, Statistical
  • Sequence Analysis, DNA* / methods
  • Sequence Analysis, DNA* / standards
  • Uncertainty