Correlation between protein sequence similarity and crystallization reagents in the biological macromolecule crystallization database

Int J Mol Sci. 2012;13(8):9514-9526. doi: 10.3390/ijms13089514. Epub 2012 Jul 27.

Abstract

The protein structural entries grew far slower than the sequence entries. This is partly due to the bottleneck in obtaining diffraction quality protein crystals for structural determination using X-ray crystallography. The first step to achieve protein crystallization is to find out suitable chemical reagents. However, it is not an easy task. Exhausting trial and error tests of numerous combinations of different reagents mixed with the protein solution are usually necessary to screen out the pursuing crystallization conditions. Therefore, any attempts to help find suitable reagents for protein crystallization are helpful. In this paper, an analysis of the relationship between the protein sequence similarity and the crystallization reagents according to the information from the existing databases is presented. We extracted information of reagents and sequences from the Biological Macromolecule Crystallization Database (BMCD) and the Protein Data Bank (PDB) database, classified the proteins into different clusters according to the sequence similarity, and statistically analyzed the relationship between the sequence similarity and the crystallization reagents. The results showed that there is a pronounced positive correlation between them. Therefore, according to the correlation, prediction of feasible chemical reagents that are suitable to be used in crystallization screens for a specific protein is possible.

Keywords: X-ray crystallography; crystallization reagents; molecular structure; protein crystallization; protein sequence similarity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Crystallization
  • Crystallography, X-Ray
  • Databases, Protein*
  • Humans
  • Multiprotein Complexes / chemistry*
  • Proteins / chemistry*
  • Sequence Homology

Substances

  • Multiprotein Complexes
  • Proteins