Improved Assessment of Globularity of Protein Structures and the Ellipsoid Profile of the Biological Assemblies from the PDB

Biomolecules. 2023 Feb 17;13(2):385. doi: 10.3390/biom13020385.

Abstract

In this paper, we present an update to the ellipsoid profile algorithm (EP), a simple technique for the measurement of the globularity of protein structures without the calculation of molecular surfaces. The globularity property is understood in this context as the ability of the molecule to fill a minimum volume enclosing ellipsoid (MVEE) that approximates its assumed globular shape. The more of the interior of this ellipsoid is occupied by the atoms of the protein, the better are its globularity metrics. These metrics are derived from the comparison of the volume of the voxelized representation of the atoms and the volume of all voxels that can fit inside that ellipsoid (a uniform unit Å cube lattice). The so-called ellipsoid profile shows how the globularity changes with the distance from the center. Two of its values, the so-called ellipsoid indexes, are used to classify the structure as globular, semi-globular or non-globular. Here, we enhance the workflow of the EP algorithm via an improved outlier detection subroutine based on principal component analysis. It is capable of robust distinguishing between the dense parts of the molecules and, for example, disordered chain fragments fully exposed to the solvent. The PCA-based method replaces the current approach based on kernel density estimation. The improved EP algorithm was tested on 2124 representatives of domain superfamilies from SCOP 2.08. The second part of this work is dedicated to the survey of globularity of 3594 representatives of biological assemblies from molecules currently deposited in the PDB and analyzed by the 3DComplex database (monomers and complexes up to 60 chains).

Keywords: bioinformatics; biological assembly; bounding ellipsoid; globularity; kernel density; nearest neighbor search; principal component analysis; protein complex; protein domain.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Databases, Factual
  • Proteins* / chemistry

Substances

  • Proteins