Bioinformatic analysis of subfamily-specific regions in 3D-structures of homologs to study functional diversity and conformational plasticity in protein superfamilies

Comput Struct Biotechnol J. 2021 Feb 23:19:1302-1311. doi: 10.1016/j.csbj.2021.02.005. eCollection 2021.

Abstract

Local 3D-structural differences in homologous proteins contribute to functional diversity observed in a superfamily, but so far received little attention as bioinformatic analysis was usually carried out at the level of amino acid sequences. We have developed Zebra3D - the first-of-its-kind bioinformatic software for systematic analysis of 3D-alignments of protein families using machine learning. The new tool identifies subfamily-specific regions (SSRs) - patterns of local 3D-structure (i.e. single residues, loops, or secondary structure fragments) that are spatially equivalent within families/subfamilies, but are different among them, and thus can be associated with functional diversity and function-related conformational plasticity. Bioinformatic analysis of protein superfamilies by Zebra3D can be used to study 3D-determinants of catalytic activity and specific accommodation of ligands, help to prepare focused libraries for directed evolution or assist development of chimeric enzymes with novel properties by exchange of equivalent regions between homologs, and to characterize plasticity in binding sites. A companion Mustguseal web-server is available to automatically construct a 3D-alignment of functionally diverse proteins, thus reducing the minimal input required to operate Zebra3D to a single PDB code. The Zebra3D + Mustguseal combined approach provides the opportunity to systematically explore the value of SSRs in superfamilies and to use this information for protein design and drug discovery. The software is available open-access at https://biokinet.belozersky.msu.ru/Zebra3D.

Keywords: (H)DBSCAN, (Hierarchical) Density-Based Spatial Clustering of Applications with Noise; 3D-structure analysis; Drug discovery; Machine learning; OPTICS, Ordering Points to Identify the Clustering Structure; Protein design; Protein superfamilies; RMSD, Root Mean Square Deviation; SDR/SDP, Specificity-Determining Residue/Position; SSP, Subfamily-Specific Position; SSR, Subfamily-Specific Regions; Specificity-determining positions; Structure-function relationship.