Stochastic pairwise alignments

Bioinformatics. 2002:18 Suppl 2:S153-60. doi: 10.1093/bioinformatics/18.suppl_2.s153.

Abstract

Motivation: The level of sequence conservation between related nucleic acids or proteins often varies considerably along the sequence. Both regions with high variability (mutational hot-spots) and regions of almost perfect sequence identity may occur in the same pair of molecules. The reliability of an alignment therefore strongly depends on the level of local sequence similarity. Especially in regions of high variability, many alignments of almost equal quality exist, and the optimal alignment is highly arbitrary.

Results: We discuss two approaches which deal with the inherent ambiguity of the alignment problem based on the computation of the partition function over all canonical pairwise alignments. The ensemble of possible alignments can be described by the probabilities P(ij) of a match between position i in the first and position j in the second sequence. Alternatively, we introduce a probabilistic backtracking procedure that generates ensembles of suboptimal alignments with correct statistical weights. A comparison between structure based alignments and large samples of stochastic alignments shows that the ensemble contains correct alignments with significant probabilities even though the optimal alignment deviates significantly from the structural alignment. Ensembles of suboptimal alignments obtained by stochastic backtracking can be used as input to any bioinformatics method based on pairwise alignment in order to gain reliability information not available from a single optimal alignment.

Availability: The software described in this contribution is available for downloading at http://www.tbi.univie.ac.at/~ulim/probA/

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Models, Chemical*
  • Models, Genetic
  • Models, Statistical
  • Proteins / analysis*
  • Proteins / chemistry*
  • Proteins / genetics
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid
  • Stochastic Processes

Substances

  • Proteins