Aligning multiple genomic sequences with the threaded blockset aligner

Genome Res. 2004 Apr;14(4):708-15. doi: 10.1101/gr.1933104.

Abstract

We define a "threaded blockset," which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for "threaded blockset aligner") builds a threaded blockset under the assumption that all matching segments occur in the same order and orientation in the given sequences; inversions and duplications are not addressed. TBA is designed to be appropriate for aligning many, but by no means all, megabase-sized regions of multiple mammalian genomes. The output of TBA can be projected onto any genome chosen as a reference, thus guaranteeing that different projections present consistent predictions of which genomic positions are orthologous. This capability is illustrated using a new visualization tool to view TBA-generated alignments of vertebrate Hox clusters from both the mammalian and fish perspectives. Experimental evaluation of alignment quality, using a program that simulates evolutionary change in genomic sequences, indicates that TBA is more accurate than earlier programs. To perform the dynamic-programming alignment step, TBA runs a stand-alone program called MULTIZ, which can be used to align highly rearranged or incompletely sequenced genomes. We describe our use of MULTIZ to produce the whole-genome multiple alignments at the Santa Cruz Genome Browser.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Base Sequence
  • Cats
  • Cattle
  • Computational Biology / methods
  • Computational Biology / standards
  • Computational Biology / trends
  • Computer Simulation
  • Dogs
  • Evaluation Studies as Topic
  • Evolution, Molecular
  • Genes, Homeobox / genetics
  • Genes, fos / genetics
  • Genome
  • Genome, Human
  • Humans
  • Mice
  • Molecular Sequence Data
  • Multigene Family / genetics
  • Rats
  • Ribosomal Proteins / genetics
  • Sequence Alignment / methods*
  • Sequence Alignment / standards
  • Sequence Alignment / trends*
  • Software / trends*

Substances

  • Ribosomal Proteins
  • ribosomal protein L34