A tool for aligning very similar DNA sequences

Comput Appl Biosci. 1997 Feb;13(1):75-80. doi: 10.1093/bioinformatics/13.1.75.

Abstract

We have produced a computer program, named sim3, that solves the following computational problem. Two DNA sequences are given, where the shorter sequence is very similar to some contiguous region of the longer sequence. Sim3 determines such a similar region of the longer sequence, and then computes an optimal set of single-nucleotide changes (i.e. insertions, deletions or substitutions) that will convert the shorter sequence to that region. Thus, the alignment scoring scheme is designed to model sequencing errors, rather than evolutionary processes. The program can align a 100 kb sequence to a 1 megabase sequence in a few seconds on a workstation, provided that there are very few differences between the shorter sequence and some region in the longer sequence. The program has been used to assemble sequence data for the Genomes Division at the National Center for Biotechnology Information.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Aldehyde Dehydrogenase / genetics
  • Algorithms
  • Amino Acid Sequence
  • Base Sequence
  • Computer Communication Networks
  • DNA / genetics*
  • Evaluation Studies as Topic
  • Genome, Human
  • Humans
  • Models, Genetic
  • Molecular Sequence Data
  • Sequence Alignment / methods*
  • Sequence Alignment / statistics & numerical data
  • Sequence Homology, Amino Acid
  • Sequence Homology, Nucleic Acid
  • Software*

Substances

  • DNA
  • Aldehyde Dehydrogenase