Standardizing macromolecular structure files: further efforts are needed

Trends Biochem Sci. 2023 Jul;48(7):590-596. doi: 10.1016/j.tibs.2023.03.002. Epub 2023 Apr 6.

Abstract

Investigating large datasets of biological information by automatic procedures may offer chances of progress in knowledge. Recently, tremendous improvements in structural biology have allowed the number of structures in the Protein Data Bank (PDB) archive to increase rapidly, in particular those for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-associated proteins. However, their automatic analysis can be hampered by the nonuniform descriptors used by authors in some records of the PDB and PDBx/mmCIF files. In this opinion article we highlight the difficulties encountered in automating the analysis of hundreds of structures, suggesting that further standardization of the description of these molecular entities and of their attributes, generalized to the macromolecular structures contained in the PDB, might generate files more suitable for automatized analyses of a large number of structures.

Keywords: PDB archive; antibody; automatic data analysis; spike protein.

Publication types

  • Review

MeSH terms

  • COVID-19*
  • Databases, Protein
  • Humans
  • Molecular Structure
  • Protein Conformation
  • Proteins / chemistry
  • SARS-CoV-2

Substances

  • Proteins