Dealing with repetitions in sequencing by hybridization

Comput Biol Chem. 2006 Oct;30(5):313-20. doi: 10.1016/j.compbiolchem.2006.05.002. Epub 2006 Aug 30.

Abstract

DNA sequencing by hybridization (SBH) induces errors in the biochemical experiment. Some of them are random and disappear when the experiment is repeated. Others are systematic, involving repetitions in the probes of the target sequence. A good method for solving SBH problems must deal with both types of errors. In this work we propose a new hybrid genetic algorithm for isothermic and standard sequencing that incorporates the concept of structured combinations. The algorithm is then compared with other methods designed for handling errors that arise in standard and isothermic SBH approaches. DNA sequences used for testing are taken from GenBank. The set of instances for testing was divided into two groups. The first group consisted of sequences containing positive and negative errors in the spectrum, at a rate of up to 20%, excluding errors coming from repetitions. The second group consisted of sequences containing repeated oligonucleotides, and containing additional errors up to 5% added into the spectra. Our new method outperforms the best alternative procedures for both data sets. Moreover, the method produces solutions exhibiting extremely high degree of similarity to the target sequences in the cases without repetitions, which is an important outcome for biologists. The spectra prepared from the sequences taken from GenBank are available on our website http://bio.cs.put.poznan.pl/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computer Simulation
  • DNA / chemistry*
  • DNA / genetics
  • Nucleic Acid Hybridization / methods*
  • Oligonucleotide Probes
  • Repetitive Sequences, Nucleic Acid*
  • Sequence Analysis, DNA / methods*

Substances

  • Oligonucleotide Probes
  • DNA