Efficient algorithms for counting and reporting segregating sites in genomic sequences

J Comput Biol. 2007 Sep;14(7):1001-10. doi: 10.1089/cmb.2006.0136.

Abstract

The number of segregating sites provides an indicator of the degree of DNA sequence variation that is present in a sample, and has been of great interest to the biological, pharmaceutical and medical professions. In this paper, we first provide linear- and expected-sublinear-time algorithms for finding all the segregating sites of a given set of DNA sequences. We also describe a data structure for tracking segregating sites in a set of sequences, such that every time the set is updated with the insertion of a new sequence or removal of an existing one, the segregating sites are updated accordingly without the need to re-scan the entire set of sequences.

Publication types

  • Evaluation Study

MeSH terms

  • Algorithms*
  • Base Sequence*
  • Genetic Variation
  • Genome*
  • Sequence Analysis, DNA