Large-Scale Sequence Comparison

Methods Mol Biol. 2017:1525:191-224. doi: 10.1007/978-1-4939-6622-6_9.

Abstract

There are millions of sequences deposited in genomic databases, and it is an important task to categorize them according to their structural and functional roles. Sequence comparison is a prerequisite for proper categorization of both DNA and protein sequences, and helps in assigning a putative or hypothetical structure and function to a given sequence. There are various methods available for comparing sequences, alignment being first and foremost for sequences with a small number of base pairs as well as for large-scale genome comparison. Various tools are available for performing pairwise large sequence comparison. The best known tools either perform global alignment or generate local alignments between the two sequences. In this chapter we first provide basic information regarding sequence comparison. This is followed by the description of the PAM and BLOSUM matrices that form the basis of sequence comparison. We also give a practical overview of currently available methods such as BLAST and FASTA, followed by a description and overview of tools available for genome comparison including LAGAN, MumMER, BLASTZ, and AVID.

Keywords: Accepted point mutation; BLOcks SUbstitution Matrix; Conservative substitutions; Dynamic programming algorithm; E value; Gap penalty; Global and local alignment; Heuristic approach; Homology; Indels; Orthologs; Paralogs; Scoring matrix; Substitutions.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Sequence Alignment
  • Sequence Analysis, Protein / methods*
  • Software