Large-Scale Sequence Comparison

Devi Lal; Mansi Verma

doi:10.1007/978-1-4939-6622-6_9

Large-Scale Sequence Comparison

Methods Mol Biol. 2017:1525:191-224. doi: 10.1007/978-1-4939-6622-6_9.

Authors

Devi Lal¹, Mansi Verma²

Affiliations

¹ Ramjas College, University of Delhi, New Delhi, 110 007, India.
² Sri Venkateswara College, University of Delhi (South Campus), Benito Juarez Road, Dhaula Kuan, New Delhi, 110 021, India. mansiverma20@gmail.com.

PMID: 27896723
DOI: 10.1007/978-1-4939-6622-6_9

Abstract

There are millions of sequences deposited in genomic databases, and it is an important task to categorize them according to their structural and functional roles. Sequence comparison is a prerequisite for proper categorization of both DNA and protein sequences, and helps in assigning a putative or hypothetical structure and function to a given sequence. There are various methods available for comparing sequences, alignment being first and foremost for sequences with a small number of base pairs as well as for large-scale genome comparison. Various tools are available for performing pairwise large sequence comparison. The best known tools either perform global alignment or generate local alignments between the two sequences. In this chapter we first provide basic information regarding sequence comparison. This is followed by the description of the PAM and BLOSUM matrices that form the basis of sequence comparison. We also give a practical overview of currently available methods such as BLAST and FASTA, followed by a description and overview of tools available for genome comparison including LAGAN, MumMER, BLASTZ, and AVID.

Keywords: Accepted point mutation; BLOcks SUbstitution Matrix; Conservative substitutions; Dynamic programming algorithm; E value; Gap penalty; Global and local alignment; Heuristic approach; Homology; Indels; Orthologs; Paralogs; Scoring matrix; Substitutions.

MeSH terms

Algorithms
Computational Biology / methods*
Sequence Alignment
Sequence Analysis, Protein / methods*
Software