Sequence determination from overlapping fragments: a simple model of whole-genome shotgun sequencing

Phys Rev Lett. 2002 Feb 11;88(6):068106. doi: 10.1103/PhysRevLett.88.068106. Epub 2002 Jan 28.

Abstract

Assembling fragments randomly sampled from along a sequence is the basis of whole-genome shotgun sequencing, a technique used to map the DNA of the human and other genomes. We calculate the probability that a random sequence can be recovered from a collection of overlapping fragments. We provide an exact solution for an infinite alphabet and in the case of constant overlaps. For the general problem we apply two assembly strategies and give the probability that the assembly puzzle can be solved in the limit of infinitely many fragments.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • DNA / genetics*
  • Genome*
  • Humans
  • Models, Genetic*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA