Algorithms, applications, and challenges of protein structure alignment

Adv Protein Chem Struct Biol. 2014:94:121-75. doi: 10.1016/B978-0-12-800168-4.00005-6.

Abstract

As a fundamental problem in computational structure biology, protein structure alignment has attracted the focus of the community for more than 20 years. While the pairwise structure alignment could be applied to measure the similarity between two proteins, which is a first step for homology search and fold space construction, the multiple structure alignment could be used to understand evolutionary conservation and divergence from a family of protein structures. Structure alignment is an NP-hard problem, which is only computationally tractable by using heuristics. Three levels of heuristics for pairwise structure alignment have been proposed, from the representations of protein structure, the perspectives of viewing protein as a rigid-body or flexible, to the scoring functions as well as the search algorithms for the alignment. For multiple structure alignment, the fourth level of heuristics is applied on how to merge all input structures to a multiple structure alignment. In this review, we first present a small survey of current methods for protein pairwise and multiple alignment, focusing on those that are publicly available as web servers. In more detail, we also discuss the advancements on the development of the new approaches to increase the pairwise alignment accuracy, to efficiently and reliably merge input structures to the multiple structure alignment. Finally, besides broadening the spectrum of the applications of structure alignment for protein template-based prediction, we also list several open problems that need to be solved in the future, such as the large complex alignment and the fast database search.

Keywords: Multiple protein structure alignment; Pairwise protein structure alignment; Protein structural alphabet; Protein structure; Structural alphabet substitution matrix.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Molecular Sequence Data
  • Protein Conformation
  • Proteins / chemistry*
  • Sequence Alignment*

Substances

  • Proteins