Simplified quality assessment for small-molecule ligands in the Protein Data Bank

Structure. 2022 Feb 3;30(2):252-262.e4. doi: 10.1016/j.str.2021.10.003. Epub 2022 Jan 12.

Abstract

More than 70% of the experimentally determined macromolecular structures in the Protein Data Bank (PDB) contain small-molecule ligands. Quality indicators of ∼643,000 ligands present in ∼106,000 PDB X-ray crystal structures have been analyzed. Ligand quality varies greatly with regard to goodness of fit between ligand structure and experimental data, deviations in bond lengths and angles from known chemical structures, and inappropriate interatomic clashes between the ligand and its surroundings. Based on principal component analysis, correlated quality indicators of ligand structure have been aggregated into two largely orthogonal composite indicators measuring goodness of fit to experimental data and deviation from ideal chemical structure. Ranking of the composite quality indicators across the PDB archive enabled construction of uniformly distributed composite ranking score. This score is implemented at RCSB.org to compare chemically identical ligands in distinct PDB structures with easy-to-interpret two-dimensional ligand quality plots, allowing PDB users to quickly assess ligand structure quality and select the best exemplars.

Keywords: PDB; Protein Data Bank; RCSB PDB; composite ranking score; ligand quality indicator; ligand structure; ligand structure quality; multivariate analysis; principal component analysis; small-molecule ligand.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Databases, Protein
  • Ligands
  • Models, Molecular
  • Protein Conformation
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Small Molecule Libraries / pharmacology*

Substances

  • Ligands
  • Proteins
  • Small Molecule Libraries