Complexity and approximability of double digest

Mark Cieliebak; Stephan Eidenbenz; Gerhard J Woeginger

doi:10.1142/s0219720005001016

Complexity and approximability of double digest

J Bioinform Comput Biol. 2005 Apr;3(2):207-23. doi: 10.1142/s0219720005001016.

Authors

Mark Cieliebak¹, Stephan Eidenbenz, Gerhard J Woeginger

Affiliation

¹ Institute of Theoretical Computer Science, ETH Zurich, 8092 Zurich, Switzerland. cieliebak@inf.ethz.ch

PMID: 15852501
DOI: 10.1142/s0219720005001016

Abstract

We revisit the DOUBLE DIGEST problem, which occurs in sequencing of large DNA strings and consists of reconstructing the relative positions of cut sites from two different enzymes. We first show that DOUBLE DIGEST is strongly NP-complete, improving upon previous results that only showed weak NP-completeness. Even the (experimentally more meaningful) variation in which we disallow coincident cut sites turns out to be strongly NP-complete. In the second part, we model errors in data as they occur in real-life experiments: we propose several optimization variations of DOUBLE DIGEST that model partial cleavage errors. We then show that most of these variations are hard to approximate. In the third part, we investigate variations with the additional restriction that coincident cut sites are disallowed, and we show that it is NP-hard to even find feasible solutions in this case, thus making it impossible to guarantee any approximation ratio at all.

Publication types

Evaluation Study

MeSH terms

Algorithms*
Base Sequence
Computer Simulation
DNA Restriction Enzymes / chemistry*
DNA Restriction Enzymes / genetics
Genetic Markers / genetics
Models, Chemical
Models, Genetic
Models, Statistical
Molecular Sequence Data
Restriction Mapping / methods*
Sequence Alignment
Sequence Analysis, DNA / methods*

Substances

Genetic Markers
DNA Restriction Enzymes