keeSeek: searching distant non-existing words in genomes for PCR-based applications

Marco Falda; Paolo Fontana; Luisa Barzon; Stefano Toppo; Enrico Lavezzo

doi:10.1093/bioinformatics/btu312

keeSeek: searching distant non-existing words in genomes for PCR-based applications

Bioinformatics. 2014 Sep 15;30(18):2662-4. doi: 10.1093/bioinformatics/btu312. Epub 2014 May 27.

Authors

Marco Falda¹, Paolo Fontana¹, Luisa Barzon¹, Stefano Toppo¹, Enrico Lavezzo¹

Affiliation

¹ Department of Molecular Medicine, University of Padova, Padova, I-35131, Italy and Department of Computational Biology, Edmund Mach Foundation, S. Michele All'Adige, I-38010 (TN), Italy.

PMID: 24867942
DOI: 10.1093/bioinformatics/btu312

Abstract

The search for short words that are absent in the genome of one or more organisms (neverwords, also known as nullomers) is attracting growing interest because of the impact they may have in recent molecular biology applications. keeSeek is able to find absent sequences with primer-like features, which can be used as unique labels for exogenously inserted DNA fragments to recover their exact position into the genome using PCR techniques. The main differences with respect to previously developed tools for neverwords generation are (i) calculation of the distance from the reference genome, in terms of number of mismatches, and selection of the most distant sequences that will have a low probability to anneal unspecifically; (ii) application of a series of filters to discard candidates not suitable to be used as PCR primers. KeeSeek has been implemented in C++ and CUDA (Compute Unified Device Architecture) to work in a General-Purpose Computing on Graphics Processing Units (GPGPU) environment.

Availability and implementation: Freely available under the Q Public License at http://www.medcomp.medicina.unipd.it/main_site/doku.php?id=keeseek.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Arabidopsis / genetics
Base Sequence
DNA Primers / genetics
Data Mining
Genomics / methods*
Mycobacterium tuberculosis / genetics
Polymerase Chain Reaction
Sequence Analysis, DNA
Software*

Substances

DNA Primers